A FIRST COURSE IN 


STATISTICAL M E THOD 


BY 


G. IRVING GAVETT 

Associate Professor of Mathematics, University 

of Washington 



First Edition 
Fourth Impression 


McGRAW-HILL BOOK COMPANY, Inc. 

NEW YORK AND LONDON 

1925 



Copyright, 1925, by the 
McGraw-Hill Book Company, Inc. 


PRINTED IN THE UNITED STATES OF AMERICA 



THE MAPLE PRESS COMPANY, YORK, PA 



PREFACE 


This book is a growth from actual teaching experience in a 
course in Statistical Method with a large number of students in 
classroom and laboratory. This experience has spread over a 
period of about ten years. The content of the course has gradu¬ 
ally been changed and the methods of presentation improved. 

It is the aim of the book to serve as a text for a foundation 
course in Statistical Method for the different departments inter¬ 
ested. The departments are mainly Mathematics, Business 
Administration, Sociology, Psychology, Commercial Engineering, 
Fisheries, and the Natural Sciences. 

The presentation of the subject, as evolved, is that which the 
author has found to be workable. Only those things are pre¬ 
sented which are deemed to be fundamental. After mastery of 
this text the student is well prepared to go on with specialized 
courses in any of the departments mentioned. He is also pre¬ 
pared to take up the more purely mathematical pursuit of the 
theory of probability and its applications to statistics. 

No advanced mathematics is necessary for a satisfactory read¬ 
ing of the text. A very elementary idea of derivative is of assist¬ 
ance in the treatment of least squares adjustment and its applica¬ 
tion in deriving Pearson’s coefficient of correlation. For the 
student unfamiliar with the idea of derivative, a simple explana¬ 
tion, satisfactory for the purpose desired, is given in Appendix D. 
However, by taking the meaning and results of a simple deriva¬ 
tive on say-so, the application is readily made in the text. 

In several cases, simplified derivations of formulas for statisti¬ 
cal constants are presented. So far as the author has been able 
to find out, he has given the only absolutely correct definitions of 
median, quartiles, and other division points in print, that provide 
a logical derivation of a simple formula for determining them. 

The exercises and illustrative examples are taken almost 
entirely from actual data. The exercises at the end of each 
chapter are merely suggestive. It has been found that the most 
satisfactory way of getting problems is to devise them as the 


VI 


PREFACE 


work proceeds. In this way the content of the problems is easily 
made to vary from term to term. The large newspaper almanacs, 
“The United States Statistical Abstract,” and the “Commerce 
Yearbook” furnish an abundance of statistical material. 

The author has used height and weight of freshman students at 
the University of Washington to illustrate the method of making 
frequency tables and their use in finding statistical constants. In 
order that each student may have his own material with which 
to work, lengths and breadths of a properly selected sample of 
leaves from a tree are found to give the most readily obtainable 
and satisfactory data. 

The importance of the application of the principles of statistical 
method to most problems of human concern, where conclusions 
are to be drawn from a mass of observed data, is constantly being 
more thoroughly recognized. The necessity for the business man, 
as well as the biologist, the sociologist, and the psychologist, to 
be acquainted with the fundamentals of statistical method is 
becoming greater every day. Without a proper understanding 
of the underlying ideas involved, erroneous and sometimes 
dangerous results are employed. 

The author’s purpose in preparing this book has been to provide 
for the needs of his own students as shown by experience. If, in 
addition, the work may prove helpful to other instructors in 
giving a satisfactory foundation in the methods of statistics, he 
will be doubly repaid for his effort. 

The author owes much to his coworkers for suggestions as to 
form and to his wife for putting the manuscript in shape for the 
publishers. 

G. I. G. 

Seattle, Wash. 

November, 1925 . 



CONTENTS 


Preface. 

Chap. 

I. Introduction. 

II. Tabulation, Frequency Distribution. 

III. Graphical Representation. Frequency Graphs. . 

IV. Averages. 

V. Dispersion. 

VI. Skewness. 

VII. Probable Error. 

VIII. Curve Fitting. 

IX. Correlation, Regression. 

X Logarithmic Graphical Representation. 

XI. Index Numbers 


Pace 
. . V 

. . 1 
. . 19 

. . 42 

. . 96 
. . 131 
. . 169 
. . 177 
. . 185 
. . 212 
. . 252 
. . 273 


Appendix A. Logarithms. 


Appendix B. Permutations, Combinations, Binomial Expansion 
Appendix C. Laws of Probability. .• 

• • • 4 

Appendix D. Derivatives and Integrals 


. 295 


. 307 


315 


. 321 


• • 

vu 


Index. 


. 353 



















A FIRST COURSE IN 

STATISTICAL METHOD 

CHAPTER I 
INTRODUCTION 

The noun statistics is either a singular noun or a plural 
noun according to the way in which it is used. 

1. Definitions—The “Standard Dictionary” gives the 
following definitions: 

1. (Noun plural.) Numerical facts, collectively, pertaining to a 
body of things, especially where systematically gathered by direct 
enumeration and collated; specifically, such facts relating to a numer¬ 
ous body of people, as of a nation, state or social organization; as 
statistics of population; statistics of agriculture; church statistics; 
statistics for a census report. 

2. (Noun singular.) The science that deals with the collection, 
classification, and tabulation, of such facts, especially as a branch of 
sociology. 

The definitions given in Webster’s “New International Dic¬ 
tionary” are as follows: 

1. (Construed as singular noun.) The science of the collection and 
classification of facts on the basis of relative number or occurrence as 
a round for induction; systematic compilation of instances for the^] 
inference of general truths. 

2. (Construed as plural noun.) Classified facts respecting the con¬ 
dition in various respects of the people in a state, or respecting any 
particular class or interest; especially those facts which can be stated 
in numbers, or tables of numbers. 

2. Discussion.—The definition of the plural noun as given by 
Webster is too narrow, as it confines itself to facts concerning the 
condition of people. That in the Standard includes facts pertain¬ 
ing to things in general. On the other hand, Webster’s definition 
of the singular noun seems more explicit. Statistics is not merely 
the science of collecting, classifying, and tabulating facts. Its 
methods when applied to a set of facts properly selected from a 

1 


2 


A FIRST COURSE IN STATISTICAL METHOD 


large body of facts enable us to draw conclusions by induction and 
establish general laws in regard to the large body of facts. The 
methods of statistics are employed by and are of great value to 
the physicist, the chemist, the biologist, the anthropologist, the 
meteorologist, the psychologist, the sociologist, the economist, 
the business man, and many others. The physicist and the 
chemist can frequently, by the experimental methods of the 
laboratory, segregate a given cause and observe directly its 
effect. The sociologist and the economist, on the other hand, 
can seldom isolate a given cause and determine directly the effect 
of that cause alone. He must make his observations on things as 
they exist in experience and not as they would be in a labo¬ 
ratory experiment. The methods of statistics are especially 
applicable to this situation. Even the observations of the 
physicist and the chemist are subject to human error and to 
variable causes not completely under control in the laboratory, 
and statistical methods become valuable again. 

3. Yule’s Definitions.—With these remarks in mind look at 
definitions given by Yule in his “Introduction to the Theory of 
Statistics.” His work is one of the standard treatments of 
statistics in the English language. On page 5 he says: 

By statistics we mean quantitative data affected to a marked extent 
by a multiplicity of causes. 

By statistical methods we mean methods specially adapted to the 
elucidation of quantitative data affected by a multiplicity of causes. 

By theory of statistics we mean the exposition of statistical methods. 

4. Multiplicity of Causes.—These definitions indicate that 
statistical methods are especially applicable in those cases where 
an observed effect is due to a “multiplicity of causes.” 

An example is the yield of wheat per acre as affected by amount 
of rainfall. The yield of wheat per acre is due to a multiplicity 
of causes besides amount of rainfall. Among them are temper¬ 
ature, attacks by wheat rust, amount of fertilizer used, etc. The 
amount of fertilizer used may depend on its price and on the 
price of wheat. The price of wheat may depend on foreign 
demand. Foreign demand depends on something else. Statis¬ 
tical methods make it possible to study the effect of amount of 
rainfall as segregated from the other causes. 

Another example is the effect of gravitation on the speed of a 
falling body. Friction of the air and a multiplicity of other things 



INTRODUCTION 


3 


besides gravitation affect the speed of a falling body. Statistical 
methods may be used. However, in this case the physicist may 
control the problem in his laboratory and reduce the effects of 
causes other than gravitation to nil or to negligible amounts. 
He can then observe directly the effect of gravitation. 

6. Comparison.—One of the great purposes in a statistical 
investigation is comparison. One may wish to compare growth 
of population in one country with growth of population in another 
country, distribution of income in a certain country at one date 
with distribution of income at another date, heights of fathers 
and heights of their sons. These and many others are statistical 
problems. 

6. Variables.—In each case the problem deals with variables. 

A variable is a quantity which changes in magnitude. It may 
change from one time to another, from one individual to another, 
or from one place to another. Distribution of income in the 
United States varies from one time to another. A variable^ 
which varies with time is called a historical variable. Height of 
fathers varies from one individual to another. Population of 
countries varies from one place to another. 

Heights of men may vary through all possible numbers between'' 
certain limits. Such a variable is said to be a continuous variable. 
The number of spots coming up in throws of three dice vary from 
3 to 18, but o^lyj^iixtegi^ Such a variable is said to be j 

a discrete variable. The observed values of the variable are 
called variates , observations, size of item, etc. 

7. Average.—In order to make definite comparisons, it is 
necessary to get numerical values of certain constants which are 
functions of the particular variable in question. As an illustra¬ 
tion take tree height as the variable. The heights of trees in one 
forest may be compared with the heights of trees in another forest 
by saying that the trees in the first forest are taller than those in 
the second. This does not necessarily mean that every tree in 
the first forest is taller than the tallest tree in the second. It is 
understood to mean that the trees of the first forest average taller 
than those of the second. A constant quantity called average 
is determined as representing tree height for the first forest. The 
same sort of average is determined for tree height of the second 
forest. It is difficult for the mind to grasp, at one time, the large 
number of tree heights in either forest. The two averages are 
readily compared. The average is a type form used to represent 
the height of the group as a whole. 


4 


A FIRST COURSE IN STATISTICAL METHOD 


8. Dispersion.—In comparing the tree heights in the two 
forests, it may happen that in the first forest, having the greater 
average tree height, there will be found very short trees and very 
tall trees, differing greatly from the average. At the same time 
in the other forest, the trees may be found to be nearly all of the 
same height, differing but little from the average. This might be 
of importance in estimating the amount of timber per acre for 
lumber. This scattering of values of the variable on each side of 
the average is called dispersion, or variability. A numerical 
measure of dispersion must be devised in order to more completely 
compare the tree heights of the two forests. 

9. Tabulation and Classification.—If the height of each tree, 
taken in order, is measured and recorded, the resulting records of 
the two forests will be difficult to compare. This leads at once 
to tabulation and classification of the values of the variable into 
series, ordered according to size. When a set of observed values 
of a variable (variates) is arranged according to size in regular 
order from smallest to largest, or vice versa, the set or group is 
said to be arrayed. The group then forms an array. 

10. Sampling.—If there is a very large number of trees in the 
forest, it may become impossible or at least impracticable to 
measure every tree height. Then some method must be devised 
for selecting a tree here and a tree there, such that the average 
and the dispersion of tree heights of the group selected may be 
expected to be the same or nearly the same as for all the trees of 
the forest. This process is called sampling. The selected set of 
values of the variable is called a sample. 

11. Statistical Regularity.—It is by virtue of what is known as 
statistical regularity that the study of a random sample of the 
entire group determines the properties of the entire group. Sup¬ 
pose it is desired to find the average height of all 7,000 students in 
a given university. Because of this law of statistical regularity, 
it is necessary to measure only a random sample of say about 700 
heights. If the students all pass through a certain door and every 
tenth one is measured, it should accomplish the purpose. In 
selecting every tenth student, it should make no appreciable 
difference whether one starts with the first, the second, the fifth, 
or any one of the first ten. It should also give a fair sample if 
an alphabetical list of students is used and every tenth one in the 
list is chosen and called in to be measured. Again one might begin 
with the first in the list, or the second, or any one of the first ten. 



INTRODUCTION 


5 


For the sample to be truly random , the names of the 7,000 
students may be written each on a card, the cards placed in a 
sack and thoroughly shaken up. Then a card may be drawn at 
random by a blindfolded person. The remaining cards in the 
sack are to be thoroughly shaken up again, and another one drawn 
at random. If the process is repeated until the required number of 
names for a good sample be thus selected, the heights of the stu¬ 
dents thus selected should constitute a random sample and be 
representative of the heights of the entire student body. 

Various other ways of selecting a random sample may be 
devised. The necessary prerequisite is that the sample shall be 
representative of the entire group. The probability of large 
error diminishes with increase of the number of items in the 
sample. Use of a number of samples also diminishes the prob¬ 
ability of large error. The mathematical theory of sampling 
cannot be taken up here. 1 

12. Classes.—Instead of tabulating and classifying all the 
tree heights of the forest, only those of the sample need to be 
used. When tabulating the values of the variable in the sample, 
they are usually divided into groups called classes. Each class 
covers the same range of values of the variable. This range is 
called the class interval. The number of variates in each class is 
called the class frequency. Such a table when completed is called 
& frequency table. 

13. Graphs.—Graphical presentation of statistical facts is used 
very extensively. The construction and interpretation of graphs 
or charts is a complete study by itself. Many of the problems of 
the statistician can hardly be solved without graphs; most of them 
are much clarified by the use of graphs. By using graphs many 
comparisons are easily made by the eye. A study of economic 
situations and business conditions is much facilitated by use of 
graphs. The graphs of frequency tables are called frequency 
graphs or histograms. The graph of a historical variable is called 
a historigram. Do not confuse this term with histogram. 

14. Deviations, Symmetry.—In studying dispersion, the \ 
amount each variate differs from the average is found. These j 
differences are called deviations from the average. If the variate ? 
is less than the average, the deviation is negative; if greater than j 
the average, the deviation is positive. If, for each positive 

nartm 0 * UDNY YdLE ’ ‘ <Introduction to the Theory of Statistics,” 


6 


A FIRST COURSE IN STATISTICAL METHOD 


deviation, there is a negative deviation of the same size, the 
distribution with respect to the average is symmetrical. The 
occurrence of asymmetry leads to the study of skewness. In 
the first case the frequency graph is symmetrical, usually bell- 
shaped. If skewness is present, the frequency graph is bell¬ 
shaped with one extremity pulled out along the base line farther 
than the other. Besides these two type forms of frequency curves, 
there are a number of others which can be thoroughly understood 
only by a study of curve-fitting . 

15. Correlation.—If a comparison be made of two variables 
such as length and breadth of a given variety of tree leaves, or 
amount of employment and savings-bank deposits, it would be 
expected that with increase of leaf length there will, in general, 

i be an increase of leaf breadth, or that with change of amount of 
I employment there will be a corresponding change, in the same 
t direction, in savings-bank deposits. If the supply of a com¬ 
modity increases, it would be expected that the price would go 
1 down. When corresponding values of two variables seem to 
have some connection so that when one increases, the other has a 
tendency to increase, or so that when one increases, the other has 
a tendency to decrease, there is said to be correlation between the 
two variables. The numerical measure of the extent to which 
certain values of one variable tend to be associated with a selected 
value of the other is called coefficient of correlation. 

16. Use of Formulas—The above considerations will be 
followed to a large extent in the succeeding chapters. 
Some modifications may be employed and some topics be intro¬ 
duced that have not been mentioned. Formulas will be derived 
for finding the numerical values of these various statistical 
constants. Many formulas have limitations in their application. 
Sometimes the method of derivation of a formula makes its 
limitations obvious. Knowledge of the application of a formula 
cannot be considered complete unless its limitations are known. 
Do not follow a formula blindly. Know what it means, what it 
signifies, what knowledge it is intended to convey. If possible 
have a thorough understanding of its derivation. Otherwise 
incorrect conclusions may be drawn. In such case results may 
be obviously wrong, or they may not be detected until irreparable 
damage is done. Thus distrust of statistics and statistical method 

is aroused. 

( 17. Accuracy, Errors.—Other sources of distrust are inaccurate 

\ data and inaccurate computations. Absolutely exact measure- 


INTRODUCTION 


7 


ments of physical things are unattainable. If a surveyor i 
measures the length of a base line with the utmost care, he gets a 
number of different results. This is due to the imperfections of' 
the apparatus with which the measuring is done and to the 
inability of man to apply the method always in exactly the same 
way. The exact value of the measurement is unknown. What is 
known as the most probable value is used. If there is no reason 
for thinking one measure is better than another, the arithmetic 
mean is used as the most probable value. The actual measures, 
or observations, will be some of them larger and some of them 
smaller than the arithmetic mean. The difference between an 
observed value and the arithmetic mean is called an error or 
deviation from the mean. These errors, being some positive and 
some negative, tend to offset each other, and are called compen¬ 
sating errors. On the other hand, if the line had been measured 
with a hundred-foot tape which was too short, then the longer the j 
line measured the greater the error due to defective tape length. / 
Such errors are called cumulative errors. Compensating errors 1 
can be largely taken care of by having a great number of measure¬ 
ments. Cumulative errors become worse the oftener it is neces¬ 
sary to apply the measuring stick. 

18. Some so-called statisticians have been known to say that 
inaccuracy in the data makes no difference if there is only a 
sufficiently large amount of data. The old man who bought 
eggs at 25 cts. a dozen and sold them at 24 was asked how he could 
make any money. He replied that he could not, except that he 

, d a ver y K reat many. If it be true that when a woman is 

asked her age she is inclined to understate it, the correct average 

age of women cannot be obtained by asking even a very large 

number of women how old they are. But if it be true that some 

women are inclined to understate and others to overstate their 

ages, it may happen that one can arrive very closely at the cor- 

rect average by getting a sufficiently large number of women to 
state tneir ages. 


are w SOrtS ° f err0rs JUSt discuss ed, there 
we what may be called mistakes. These should not be allowed 

to pass A constant checking of work in every way possible 

“ n0t a Crime t0 a mistake 6 

mistake stand* a’ at least ’ Very re P rehe “ible to let the 

, an< !' ^ mistake m an engineer’s computations some- 
uaes results in disaster with great loss of life. 



8 


A FIRST COURSE IN STATISTICAL METHOD 


20. Standard or Degree of Accuracy.—In every problem or 
investigation involving numerical quantities, there should be a 
standard or degree of accuracy established. Every item should 
be brought to this standard. This does not mean that every 
measurement must be the most accurate possible. It may serve 
the purpose of a statistical problem to know the population of a 
state to the nearest ten thousand or it may be needed to the 
nearest hundred. The number of students in the various classes 
of the university might be needed to the nearest unit. The 
standard of accuracy needed is determined by the purpose for 
which the data are to be used. The standard of accuracy is 
limited by the closeness with which the quantities involved can 
be measured. 

21. Recording of Numbers.—Numbers are recorded so that 
each shows the standard of accuracy used. If populations of 
cities are used .to the nearest thousand, then each number is 
recorded with three ciphers at the end to fill out to the decimal 
point. Thus: 


City Population 

A. 427,000 

B . 536,000 

C. 720,000 

n 329,000 


Sometimes these ciphers that serve merely as fillers are written 
smaller than the other figures. In the above table the use of 
fillers could be avoided by indicating the standard of accuracy in 


the column heading. 

City 

A. 

B. 

C. 

D. 


Population, 
in Thousands 

.... 427 
.... 536 
.... 720 
.... 329 


If the lengths of table tops are being measured to the nearest 
hundredth of a foot, then record each result to show that degree 
of accuracy. If one table top seems to measure 5 ft., do not 
record it as 5, but write the result 5.00, using ciphers to fill in to 

hundredths place. 

22. False Accuracy.—Certain numerical processes may be 
carried out to indicate a higher degree of accuracy than that 
actually attained. This should not be done. Usually, the result 
i i s no more accurate than the least accurate element in the com¬ 
putation. This may be illustrated by an example in multiplica- 

I 











INTRODUCTION 


9 


tion. Suppose the length and the breadth of a table top, 
measured to the nearest hundredth of a foot, are 5.21 and 2.34 ft., _ 
respectively. Find the area in square feet. The data have 
three-figure accuracy. I{ojreater than three-figure accuracy: can 
^expected in the result. Multiplying in the ordinary way gives: ( 

5.21 

2.34 

2084 

1563* 

1042** 

12.1914 

Using three-figure accuracy this result would be read 12.2 sq. ft. 

The length given as 5.21 ft. means that the actual length was 
somewhere between 5.205 and 5.215 ft. In the same way the 
actual breadth is somewhere between 2.335 and 2.345 ft. The 
product of the smaller length and breadth, 5.205 and 2.335, is 
12.153675. The product of the greater length and breadth, 
5.215 and 2.345, is 12.229175. The area then lies between these 
two products. They agree if the right-hand figures are dropped 
till only three-figure accuracy is indicated. The result is 12.2, 
which agrees to three figures with the product of the original 
dimensions as measured. 

23. By inspecting the original multiplication of 5.21 by 2.34, 
it is readily seen that the two right-hand figures, 14, are very 
likely incorrect and that the 9 is somewhat unreliable. It is not 
known whether the 5.21 is too large or too small. So the last figure, 

4, of the first partial product is uncertain. The same thing is 
true of the last figures of the other partial products. Moreover, 
if the measurement of length and breadth had been carried out 
to the nearest thousandth of a foot, one would not have the 
slightest idea what figures would replace the *’s shown in the 
example. It is thus very evident that the last two figures, 14, of 
the product are almost sure to be wrong and that the 9 is at least 
somewhat uncertain. Writing the result as 12.1914 sq. ft. tells a) 
e. It leads one to believe that there is six-figure accuracy in the 
result, where there really is only three-figure accuracy. Similar 
considerations apply to division 
of computation. 

2 u* ^ ds . t0 com P uta tion should be freely used. Crelle’s 1 
multiplication tables, Barlow’s tables of squares, square roots, 

etc., and tables of logarithms are very useful. The common 


square root, and other processes] 



10 


A FIRST COURSE IN STATISTICAL METHOD 


pocket slide rule gives very quick results where only three- 
figure accuracy is necessary. This is usually the case in computa¬ 
tions of percentages. If any extended statistical work is to be 
done, a computing machine is almost indispensable. 

Table I. —Weight and Height of Freshman Men at the University of 

Washington, 1923-1924 

Weight in pounds. Height in inches 



136 

103 

114 

135 

157 


149 
151 

144 

148 

133 

118 

136 

142 

163 

145 

141 
130 
138 

150 

149 

155 

153 

124 
129 
173 

155 

165 

142 
129 

134 

150 

125 
105 

141 
125 

142 
149 
160 
141 
145 


75.0 

67.0 

65.5 
71.0 
OS.O 

6S.0 
70.0 
70 0 

68.5 

71.5 

68.0 

72.0 

67.5 
69.0 
72.0 

OS.O 
70 0 

67.5 
67.0 

66.5 

70.0 
65.0 

69.5 

70.5 

67.5 

70.0 

68.5 
68.5 
67.0 

68.5 

66.0 

69.5 
68.0 
71.0 
69.5 

69.5 
72.0 
67.0 
67.0 
71.0 

69.0 

67.0 

67.5 

70.5 
70.0 

67.5 
64.0 
67.5 
70.0 
67.5 


167 

147 

130 
153 
160 

147 

157 

165 

122 

142 

131 
141 
136 

152 
151 

143 

159 
163 
145 

147 

136 

138 

148 

160 
162 

143 

150 

130 
141 
141 

138 

133 

141 

112 

131 

151 
163 
124 
126 

130 

172 

131 
170 
135 

132 

140 

132 

130 

160 

153 


70.0 

68.5 

65.5 

67.5 

67.5 

72.0 

69.5 
69.0 
65.0 

67.5 

69.5 

68.5 

65.5 
68.0 

69.5 

68.0 

71.0 

75.0 

67.0 

68.5 

69.0 

65.5 
67.0 
71.0 
70.0 

69.0 

72.0 

66.5 
68.0 

67.5 

68.0 

66.5 
72 0 

60.5 
67.0 

69.0 

70.5 

65.5 
64.0 

63.5 

69.0 

65.0 

64.0 

64.5 

66.0 

69.0 

68.0 

65.0 

69.0 

69.0 


68.0 

68.5 
66.0 
72.0 

63.5 

68.0 

63.5 
64.0 
68.0 

72.5 

67.5 

65.5 

69.5 


66.5 



8. 

68.5 



65.5 
68 

.5 

67.5 
63 



66.0 

63.5 
68.0 

70.5 

65.5 



180 

137 

147 

155 

178 

169 

134 

131 
164 
149 
155 

169 

137 

162 

130 

132 

122 

137 

154 
141 
137 

155 
120 

135 
128 

140 

141 
139 
130 

152 

153 

164 

146 

13S 

12S 

142 




72.5 
67.0 

69.5 

66.5 

67.5 

70.5 

67.5 

70.5 

66.5 

71.5 
72.0 
69.0 
72.0 

69.5 
66.0 
70.0 
69.0 
69.0 

67.0 

68.0 

64.5 
67.0 
72.0 

69.0 

66.5 

65.5 
70.0 
68.0 

67.0 

68.5 

69.5 

65.5 
64.0 


150 

153 

126 

143 

139 

169 

135 

119 

190 

141 

15S 

146 

122 

176 

139 


163 

121 

110 

130 

148 

136 

116 

109 

136 

161 


174 

134 

132 

122 

191 

110 

140 

124 

129 

140 


70.0 

68.0 

69.0 

71.0 

72.0 

71.5 
70.0 
66.0 
69.0 
70.0 

70.0 

70.0 

68.5 

74.5 

70.5 

65.0 

68.0 

71.0 

67.5 
67.0 

71.0 

63.5 
67.0 

64.5 
69.0 

66.5 
67.0 

70.5 
71.0 
70.0 

69.0 

64.5 

62.5 
69.0 

67.5 

71.0 

65.5 
71.0 

68.5 

67.5 

70.5 
68.0 
67.0 

63.5 

72.5 

64.0 

67.0 

6S.0 

65.0 

69.0 



170 

135 

136 

137 
148 


163 

139 
122 
134 

140 

132 

120 

148 

129 
152 

150 

125 

178 

112 

130 

128 

115 

144 

173 

138 


176 

133 

136 

135 

145 


70.0 

65.0 

64.5 

63.5 
69.0 

63.0 

64.5 
70.0 

70.5 
62.0 

70.0 

67.0 

62.5 

67.5 
67.0 

68.5 

65.5 
68.0 
67.0 
67.0 

64.0 
64.0 
72 0 
67.0 

68.5 

68.0 

65.5 

65.5 
72.0 
68.0 

69.0 

67.0 

70.5 

63.5 
69.0 

67.5 

66.0 

66.5 
73.0 

67.5 

72.0 

68.0 

62.0 

67.5 

67.5 

66.5 
69.0 
69.0 

70.5 

68.0 































INTRODUCTION 


11 


Table I.—( Continued) 


Weight 

% 

Height 

Weight 

Height 

Weight 

Height 

Weight 

Height 

Weight 

Height 

Weight 

Height 

178 

69.5 

127 

66.5 

126 

67.5 

145 

69.5 

135 

70.0 

150 

68.5 

144 

67.0 

156 

70.5 

137 

67.0 

121 

68.0 

159 

70.0 

145 

68.5 

145 

67.5 

120 

66.0 

150 

67.5 

167 

67.0 

165 

07.0 

154 

69.5 

146 

69.0 

145 

69.0 

12S 

67.5 

136 

70.0 

144 

6.8.5 

171 

72.0 

159 

73.0 

131 

66.0 

138 

68.0 

125 

68.5 

126 

66.5 

136 

G7.5 

156 

70.5 

142 

69.0 

125 

63.5 

114 

64.5 

129 

68.5 

107 

08.5 

148 

69.0 


wnci' 

117 

65 . 5 

149 

67.5 

142 

72.0 

143 

68.5 

151 

69.5 

205 

69.5 

140 

70.0 

127 

69.5 

132 

63.5 

137 

67.5 

131 

69.5 

136 

67.5 

143 

70.0 

159 

66.0 

EE21 


152 

70.0 

153 

69.5 

142 

66.0 

129 

63.0 

112 

63.5 

161 


160 

67.5 

150 

70.0 

142 

69.0 

116 

65.5 

132 

69.0 

141 

67.0 

172 

70.5 

167 

74.5 


68.5 

145 

71.0 

146 

70.0 

115 

66.5 

140 

68.5 

152 

69.0 

117 

67.5 

162 

73 0 

116 

64.0 

157 

07.5 

131 

68.0 

119 

67.5 

145 

69.0 

134 

67.0 

140 

66.0 

124 

66.0 

154 

08.0 

148 

69.0 

137 

66.5 

132 

67.0 

146 

65.0 

156 

69.0 

116 

69.0 

120 

65.5 



120 

65.0 

152 

68.5 

208 


123 

66.5 

136 

70.0 

123 

, 67.5 

143 

70.0 

144 

68.5 

132 


145 

69.0 

144 

72.0 

124 


145 

69.0 

128 

70.0 

210 


139 

68.5 

132 

67.5 

199 

68.0 

162 

66.5 

170 

68.5 

126 

68.0 

147 

66.5 

131 

66.5 

179 

71.0 

137 

66.5 

115 

64.5 

134 

68.0 

153 

70.0 

194 

68.5 

149 

69.5 

122 

64.0 

130 

67.0 

130 

65.0 

133 

67.0 

111 

65.0 


EHE1I 

156 

68.0 

135 

70.0 


67.5 

174 

71.0 

122 

68.5 

140 

72.5 

194 

70.5 

150 

67.0 

143 

67.5 

155 

67.5 

124 

63.0 

119 


172 

75.0 

147 

72.0 

166 

68.5 

136 

68.5 

145 

72.5 

144 


155 

71.0 

142 

68.5 

148 

67.5 

15S 

72.0 

125 

71.0 

124 

69.5 

155 

71.0 

112 

67.0 

162 


122 

69.0 

122 


117 

65.0 

115 

69.0 

152 

70.0 

157 


158 

74.5 

133 


137 

66.0 

174 

69.5 

127 

66.0 

131 

69.0 

130 

65.0 

159 


162 

71.5 

14S 

68.0 

143 

66.0 

139 

65.5 

149 

70.0 

100 

66.5 

154 

71.5 

119 

65.5 

147 

6S.0 

142 

68.5 

130 

70.0 

125 

71.5 

wm 

66.0 

156 

71.0 

130 

66.0 

142 

68.0 

135 

61.0 

126 

67.5 

148 

6S.0 

150 

68.0 

155 1 

71.0 


66.5 

153 

70.5 

168 

73.5 

129 

66.0 

163 

70.0 

145 1 

67.5 

145 

66.5 

145 

67.0 

149 

70.0 

157 

69.0 

153 

67.0 

110 

67.0 

185 

66.5 

141 

67.0 

142 

6S.0 

141 

65.5 

122 

67.0 

133 

65.0 

126 

70.5 

142 

68.5 

167 

70.0 

175 

69.0 

159 

71.0 

161 

72.5 

135 

65.5 

122 

66.5 

153 

71.0 

115 

67.5 

153 

70.0 

125 

66.0 

132 

66.0 

179 

69.5 

152 

67.5 

144 

64.0 

Ml 

69.0 

107 

67.5 

187 

70.5 

135 

69.5 

120 
/V tap 

66.5 

136 

67.5 

123 

62.0 

112 

64.0 

152 

70.5 

126 

67.5 

137 

72.5 

138 

67.0 

146 

64.0 

182 

66.5 

136 

62.5 

143 

69.5 

16! 

69.0 

150 

64.0 

144 

70.5 

140 

67.5 

122 

wm 

137 

67.5 

136 

68.0 

131 

69.0 

122 

67.0 

145 

70.5 

163 


126 

65.5 

111 

1 Ort 

61.0 

139 


157 

68.5 

1 IS 

67.5 

148 


125 

65.0 

139 

1 OA 

66.5 

173 


no 

62.0 

161 

69.0 

149 

70.5 

151 

69.5 

130 

68.5 

148 


120 

66.5 

161 

72.0 

126 

67.5 

140 

69.0 

139 

1 A A 

68.5 

143 


169 

68.0 

145 

72.5 

162 

70.0 

wm 

70.5 

149 

67.0 

165 


153 

68.0 

109 

62.0 

163 

69.0 

iffi 

69.5 

170 

lie 

72.0 

194 

69.5 

136 

66.5 

156 

67.0 

139 

70.0 

170 

71.5 

145 

1 CO 

66.5 

152 

68.0 

141 

66.0 

130 

67.0 

158 

68.0 

M2 

67.5 

lo3 

67.5 

165 

69.0 

131 

69.0 

129 

67.0 

123 

68.5 

136 

66.0 

138 

1 o a 

64.0 

aa r- 

142 

69.0 

153 

66.0 

125 

66.0 


73.0 

126 

63 5 

1-5D 

ioe 

69.5 

muiM 


155 

68.0 

148 

67.5 

146 

70.5 

138 

69.5 

lo5 

68.5 



129 

68.0 

130 

68.5 

145 

ifl 

153 

70.5 

loo 

111 

66.0 
1JA A 

!W41 


153 

68.0 

147 

69.0 

126 


125 

69.5 

1*4 1 

69.0 

176 

| 68.5 

125 

65.0 

172 

72.0 

163 

| 72.5 









































12 


^1 FIRST COURSE IN STATISTICAL METHOD 


Table II.— W 


Weight, 

in 

pounds 


EIGHTS AND HEIGHTS OF 629 FRESHMAN MeNAT THE 
sity of Washington 


Height, in inches 


Untver- 


o ©ho ©Ho:© »o o o o Uo|© Ho loU- © tor© Ho loi^o 


o —* 
to to 


o o o b o 6 -i o to to I to I to [o 


S QCICJ C 5 o o 
tojo QNN 


O *0 O kO o lO o 

- N N W w 
N h» N N N N N 


Totals 





I I 11 I II I I 


I I I I I 


I I I I I I I I 


I I I I I I I I 


I I I I I I I I 


I I I I I I I II 


I I I I I I I I 


II I I I Ml 


III I I l I I I 


I I l Ml I I I 


i| I I I I I 2111 


>1 I i I I I 


11 I III I 2| I 


I I I I I I I I 


Ml Ml ill 


M I i| l n Mi 


HI l i| • Ml I 


Ml il 2| | | Ml 


I I l I I I Ml 


I I I 21 II I 111 


I II II 2] 2| 2| I | 




I I I I I 21 111 1 


1| I I 1| 2| II | |2 


2| II 2| 1| 3| I 2| | 


II I 31 1| I 1| Mil 


I I I H 2 II | I 


ii iii i I n i 

I I 21 I 2| | 31 II 
| 2| 21 1| II II 2| II 

I in Mi ii il 1 i 


I I I 31 


| I ll 3| II 31 1 


11 


H I 


I n i 


H I 1 


I n i 


I l 1 


I i 
H I 

51 11 1 
H 4| 2 










































































































I NT ROD UCTION 


13 


Weight, 

in 

pounds 


Table II.— (Continued) 


Height, in inches 


O us|©h«.oi«»i© 




I 1 I 


I I I 


I I I 


I I I 


I I I 


I I 


I I I 


I i I 


I I I 


I I I 




I I I 


I I I 


I M 


I I I 


I I I I I I I I I I I I 1 I I 


21 I 


I I 


I l 


I l 


1 II 


I I 


I I 


I II 


I I 



inn 


Mil 


l n i 


i i n 


l i 11 


i i 


l n n 


l i i 


jiii 


l i i 


l i i 


I i i 


l i i 


i i 


l i i 


i i i 


i i 


l i n 


l i i 


I i i 


i i i i i i i 


i l 


n l 


llio 




























































































14 


A FIRST COURSE IN STATISTICAL METHOD 


Table III.—Monthly and Annual Precipitation 1 at Seattle 


Year 

January 

February 

March 

April 

May 

June 

July 

August 

_ 

September 

October 

November 

December 

Annual 

1900 

3.04 

4.35 

4.45 

1.55 

3.73 

2.51 

0.66 

0.30 

0.72 

4.16 

3.80 

7.21 

36.48 

1901 

4.26 

4.26 

1.62 

3.86 

1.44 

1.90 

0.35 

0.13 

2.30 

1.44 

6.17 

2.45 

30.18 

1902 

5.25 

8.10 

4.19 

2.20 

1.86 

1.71 

2.01 

0.48 

2.15 

2.71 

6.30 

8.82 

46.78 

1903 

3.12 

1.45 

6.10 

1.56 

2.39 

1.55 

1.28 

0.50 

3.IS 

1.94 

7.34 

4.14 

34.55 

1904 

4.26 

5.31 

6.22 

3.08 

0.34 

1.23 

0.58 

0.07 

0.22 

2.32 

8.81 

5.29 

37.73 

1905 

5.61 

2.70 

3.45 


3.37 

3.03 

0.36 

0.44 


4.29 

2.68 

5.62 

34.35 

1906 

5.03 

4.60 

0.89 


2.64 

1.97 


0.15 

3.31 

3.16 

7.67 

6.80 

36.67 

1907 

4.18 

3.87 

1.12 


0.9S 



0.80 

3.39 

0.67 

4.73 

6.33 

29.10 

1908 

4.10 

4.24 

2.54 


3.19 

ESS 


0.82 

0.23 

2.34 

4.60 

3.61 

28.25 

1909 

6.90 

4.35 

1.08 


1.60 

0.64 


0.27 

1.10 

2.86 

9.11 

2.69 

31.72 

1910 

5. OS 


1.80 


1.88 

0.82 


0.17 


4.02 

8.47 

3.47 

34.20 

1911 

3.67 

1.42 



2.4S 



0.13 

3.27 


3.26 

3.75 

21.69 

1912 

4.52 

3.11 

1.79 

1.73 

1.64 

2.76 

1.15 

2.49 


3.97 

6.82 

4.43 

35.14 

1913 

4.89 

1.34 

1.55 

0.83 

1.37 

1.71 

El&ty 

0.45 

2.37 


4.74 

2.61 

24.59 

1914 

9.82 

1.93 


3.31 

0.74 

1.75 


fjjprn 

1.42 

4.37 

5.28 

1.39 

31.43 

1915 

6.35 

2.76 

1.72 

2.91 

1.72 




0.65 

3.00 

5.66 


33.83 

1916 

4.32 

6.85 

5.45 

1.98 

1.56 

1.82 

1.93 

QR] 


1.18 

4.58 

4.13 

34.61 

1917 

2.02 

1.43 

2.96 

4.48 

0.83 

3.70 


imm 

1.29 

0.16 

2.70 

9.21 

28.90 

1918 

2.94 

4.81 

3.92 

0.96 

1.19 

m 

1.38 

1.12 


3.46 

3.81 

5.04 

29.21 

1919 

7.95 

3.77 

1.84 

3.20 

2.OS 

m 


o.os 

2.03 

1.59 

4.13 


31.34 

1920 

3.92 

0.34 

2.82 

3.46 

0.96 

1.93 


1.15 

2.34 

4.19 

4.42 

5.C8 

32.21 

1921 

5.56 

4.82 

3.06 

1.76 

1.93 

1.29 

0.18 

1.61 

1.84 

3.91 

6.60 

7.25 

39.81 

1922 

1.89 

1.74 

4.45 

2.53 

1.08 

ess 

□tty 


1.19 

2.37 

1.46 

7.37 

25.27 

1923 

7.51 

2.72 

1.37 

1.67 

1.45 

w 

0.68 

1.9S 

1.37 

2.05 

2.06 

3.31 

27.18 

Average. 

4.84 

3.56 

2.77 

2.11 

1.77 

1.42 



1.61 

2.63 

5.22 

5.10 

32.26 


Bold-faced hgurcs indicate greatest and least monthly and annual amounts. 

1 Annual Metorological Summary of the Weather Bureau Office at Seattle, Wash., M* B. 


Summers, Meteorologist. 















































INTRODUCTION 


15 


Table IV. —Monthly and Annual Mean Temperature 1 at Seattle 


Year 

January 

February 

March 

April 

May 

O 

5 

•-5 

July 

August 

September 

October 

November 

December 

Annual 

1900 

43.8 

42.3 

49.9 

51.8 

56.0 

61.9 

05.0 

62.0 

58.6 

51.8 

44.4 

45.8 

52.8 

1901 

40.2 

42.9 

46.0 

47.8 

55.4 

57.1 

61.6 

65.7 

58.0 

57.2 

49.2 

12.4 

52.0 

1902 

39.8 

46.4 

45.0 


56 . 5 

60.6 

63.8 

64.4 

58.4 

54.3 

45.8 

41.8 

52.2 

1903 

42.9 

40.2 

42.5 

47.9 

54.6 

01.9 

62.0 

63.2 

57.7 

53.6 


42.6 

51.2 

1904 

41.8 

40.2 

42.6 

52.6 

55. 1 

59.7 

64.2 

03 . 6 

61.0 

54.2 

49 . 4 

43.9 

52.4 

1905 

40.8 

43.3 

49.8 

52.7 


59.2 

64.6 

62.6 

59.4 

48.0 

45.2 

42.6 

51.9 

1900 

42.4! 

43.8 

45.5 

53.0 

54.6 

58.0 

67.7 

65.0 

58.9 

54.0 

45.2 

42.4 

52.5 

1907 

33.8 

41.8 

43.2 

49.1 

57.0 

59.4 

64.2 

01.3; 

58.0 

53.2 

4S.0 

43.4 

51.1 

1908 



44.2 

49.0 

51.6 

58.8 

65.1 

62.2 1 


50.8 

48.8 

39.5' 

50.9 

1909 

34.2 

42.0 

44.1 

47.7 

52.3 

59.4 

ec.o 

61.6 


52.2 

45.5 

30.4 

49.7 

1910 

39.0 

38.3 

47.6 

49.4 

57.0 

57.5 

62.6 

60.2 

58.0 

52.8 

45.5 

42.8 

50.9 

1911 

38.0 

39.8 

45.8 

46.6 

52.7 



62.6 

57.1 

52.5 

44.7 

41.2 

50.2 

1912 

42.6 

43.9 

44.3 


57.0 

60.0 

63.4 

62.2 

59.2 

49.9 

46.2| 

41.6 

51.5 

1913 

36.6 


41.9 



59.5 

63.4 

64.8 

58.8 

50.1 

46.2' 

42.4 

50.6 

1914 

43.2 

42.3 

47.6 

51.4 

57.3 

58.9 

64.2 

63.2 

56.7 

54.6 

47.2 

39.7 

52.2 

1915 

40.6 

44.5 


52.6 

56.0 

59.8 

64.3 

66.8 

59.1 

53.7 

43.7 

42 0 

52 8 

1916 

m 

41.9 

44.4 

49.0 

52.0 

58.8 

61.1 

63.6 

58.8 

49.1 

43.0 

38.0 

49.2 

1917 

53E 

39.3 


46.8 

52.4 

57.2 


65.2 

58.9 

52.5 

49.6 

45.0 

50.8 

1918 

43.7 




52.4 

61.6 


62.6 

62.2 

53.4 

45.8 


51.6 

1919 

41.4 

HE 

44.7 

49.6 

53.6 

57.5 


63.0 

59.6 

48.5 

44.9 

38.6 

50.4 

1920 


HE 

44.4 

45.6 

51.6 

58.6 



57.8 

50. 1 


43.4 

50.6 

1921 


42.9 

44.6 

47.5 

53.0 

59.8 

60.8 

62.0 

57.0 

53.2 


39.1 

50.5 

1922 

35.5 

39.4 

41.5 

46.6 

54.5 

60.8 

62.9 

62.7 

59.8 

53.5 

43.6 

38.4 

49.9 

1923 


37.3 



54.1 

60.6 

64.4 

65.7 

60.8 

54.6 

47.4 

42.2 

51.9 

Average. 

39.7 

41.4 

44.9 

49.4 

54.4 

59.3 

63.5^ 

63.4 

58.8 

52.4 

46.2 

41.5 

51.3 


1 Annual Meteorological Summary at the Weather Bureau Office at Seattle. Wash., M. B. 
Summers, Meteorologist* 
































16 A FIRST COURSE IN STATISTICAL METHOD 


Table V.—Number of Heads and Tails Falling in Each of 500 Throws 

of Seven Dimes 


u. 

cj 

C 

E 

Tails 

ui 

•O 

c3 

O 

S3 

Tails 

-a 

a 

o 

E 

JO 

'5 

h 

a 

-u 

c5 

O 

E 

-S 

'3 

e- 

Heads 

'3 

(0 

T5 

63 

O 

b 

CO 

• mm 

c3 

H 

VJ 

-o 

C3 

O 

S3 

jo 

’3 

H 


La 

'5 

H 

CO 

TJ 

d 

o 

B 

JO 

*5 

H 


CO 

'3 

H 

2 

5 

4 

3 

3 

4 

3 

4 

4 

3 

1 


3 

4 

4 

3 

3 

4 

3 

4 

2 

5 

2 

5 

3 

4 

2 

5 

1 

6 

4 


5 

2 

4 

3 

3 

4 

4 

3 

2 

5 

2 

5 

1 

G 

2 

5 

3 

o 

3 


3 

4 

2 

5 


1 

3 

4 

4 

3 


3 

n 

3 

4 

3 

3 

ti 

5 

2 

3 

4 

4 

3 

3 

4 

3 

4 

1 

6 

y 

1 

y 

y 

2 

5 

6 

1 

6 

1 

y 

3 

4 

3 

G 

1 

2 

5 

3 

4 


3 

H 

3 

5 

2 

3 

n 

5 

2 

H 

3 

6 

1 

3 

n 



5 

2 

2 

5 

H 

3 

3 

4 

3 


5 

2 

5 

2 

3 

4 

3 

tl 



3 

4 

4 

3 

3 

4 

4 

3 

3 

n 

n 

3 

4 

3 

4 

3 

3 

n 



4 

3 

( 

3 

jO 

3 

3 

4 

2 

5 

tl 

3 

3 

4 

5 

2 

3 

4 



4 

3 

4 

3 


3 

5 

2 

3 

y 

2 

5 

2 

5 

3 

4 

2 

5 



5 

2 

2 

5 

H 

3 

5 

2 

3 

H 

1 

6 

2 

5 

1 

0 

3 

4 



5 

2 

5 

2 

2 

5 

2 

5 

1 

6 

2 

5 

2 

5 

3 

4 

2 

5 



1 

6 

3 

4 

El 

1 

3 

4 

o 

3 

2 

5 

4 

3 

3 

4 

2 

5 



4 

3 

5 

2 

u 

3 

3 

4 

El 

3 

5 

2 

0 

7 

3 

4 

5 

2 



2 

5 

5 

2 

y 

3 

2 

5 

1 

D 

3 

4 

5 

2 

3 

4 

4 

3 



1 

6 

3 

4 

H 

3 

2 

5 

5 

2 

5 

2 

3 

4 

4 

3 

3 

4 



3 

4 

2 

5 

tl 

3 

4 

3 

4 

3 

4 

3 

5 

2 

2 

5 

2 

5 



1 

6 

6 

1 

6 

1 

3 

n 

1 

G 

3 

4 

I 

6 

3 

4 

2 

5 



3 

El 

2 

5 

(1 

3 

3 

u 

2 

5 

2 

5 

4 

3 

4 

3 

3 

4 



3 

y 

2 

5 

y 

3 

3 

y 

2 

5 

G 

1 

5 

2 

2 

5 

3 

4 



3 


1 

6 

H 

3 

3 

H 

4 

3 

4 

3 

n 

6 

3 

4 

2 

5 



5 

2 

3 

4 

2 

5 

5 

2 

3 

4 

2 

5 

o 

3 

5 

2 

1 

6 



3 

4 

Efl 

3 

2 

5 

3 

4 

4 

3 

4 

3 

2 

5 

G 

1 

3 

o 



4 

3 

111 

3 

3 

4 

4 

3 

3 

4 

2 

5 

5 

2 

H 

1 

3 

it! 



4 

3 

3 

4 

4 

3 

3 

4 

5 

2 

2 

5 

1 

6 

U 

3 

y 

3 



5 

2 

4 

3 

1 

6 

4 

3 

3 

4 

4 

3 

3 

n 

3 

4 

R 

3 



4 

3 

2 

5 

3 


3 

n 

3 

4 

5 

2 

3 

ti 

5 

2 

ti 

3 



1 

0 

3 

mm 

3 


3 

n 

2 

5 

4 

3 

n 

3 

3 

4 

3 

4 



a 

2 

3 

ftf 

4 


2 

5 

n 

3 

3 

4 

ti 

3 

in 

3 

4 

3 



3 

4 

5 

2 

5 


4 

3 

u 

3 

2 

5 

2 

5 

II 

3 

3 

4 



5 

2 

3 

4 

3 


3 

n 

3 

n 

5 

2 

3 

4 

3 

4 

4 

3 



5 

2 

5 

2 

5 

2 

3 

n 

3 

ri 

2 

5 

4 

3 

2 

5 

G 

1 



5 

2 

5 

2 

5 

2 

4 

3 

4 

3 

4 

3 

3 

4 

3 

4 

5 

2 



2 

.5 

mm 

3 

2 

5 

5 

2 

3 

4 

3 

4 

n 

3 

2 

5 

3 

4 



4 

3 

Kfl 

3 

y 

3 

1 

Q 

5 

2 

H 

1 

y 

3 

5 

2 

1 

6 



4 

3 

3 

4 

II 

3 

4 

3 

5 

2 


3 

H 

3 

3 

n 

5 

2 



5 

2 

2 

5 

5 

2 

2 

O 

3 

4 

u 

3 

o 

3 

7 

□ 

3 

4 



5 

2 

6 

1 

5 

2 

2 

5 

5 

2 

tl 

3 

5 

2 

1 

6 

4 

3 



3 

4 

4 

3 

3 

4 

3 

4 

5 

2 

3 

4 

5 

2 

EJ 

|l 

3 

4 



5 

2 

3 

D 

4 

3 

2 

5 

G 

1 

2 

5 

4 

3 

3 

D 

4 

3 



4 

3 

3 

VI 

5 

2 

3 

4 

5 

2 

n 

3 

3 

4 

2 

5 

3 

4 



4 

3 

4 

3 

2 

5 

. r > 

2 

2 

5 

if 

3 

5 

2 

El 

3 

5 




3 

4 

5 

2 

4 

3 

2 

5 

4 

3 


3 

1 

G 

tl 

3 

3 




4 

3 

2 

5 

5 

2 

3 

4 

3 

4 


3 

El 

2 

3 

El 

3 




1 

6 

3 

4 

1 

H 

H 

3 

2 

5 

u 

3 

y 

3 

3 

D 

4 

O 



2 

5 

n 

3 

3 


H 

2 

n 

1 

3 

4 

H 

3 

u 

3 

5 

2 



1 

6 

H 

3 

3 

U 

2 

5 

3 

4 

4 

3 

3 

4 


3 

. r > 

2 



2 

o 

5 

2 

6 


4 

3 

4 

3 

3 

4 

4 

3 

n 

3 

3 

4 



3 

n 

4 

3 

3 

fi 

3 

4 

3 

n 

4 

3 

1 

El 

3 

4 

4 

3 



3 

H 

6 

1 

2 

5 

4 

3 

3 


2 

5 

4 

3 

5 

2 

3 

4 



3 

H 

n 

2 

3 

4 

5 

2 

3 

H 

3 

4 

3 

4 

1 

G 

5 

2 

#» 



3 

i 

Kf 

3 

2 

5 

SI 

2 

2 

5 

2 

5 

5 

2 

u 

3 

H 

/> 

ft 



4 

3 

WBi 

3 

G 

1 

H 


1 

G 

2 

.*> 

H 

3 

n 

3 

El 

3 



3 

mm 

6 

1 

1 

6 

8ri 

3 

3 

4 

4 

3 

fa 

0 

3 

4 

K1 

0 



3 

H 

1 

m 

3 

a 

1 5 

2 

a 

3 

2 

5 

II 

3 

5 

2 

l 

3 







































































































































































































































INTRODUCTION 


17 


Exercises 

1. In Table I are recorded the height and weight of 629 freshman men 
in the year 1923-1924 at the University of Washington. These records 
were selected from the University Health Service cards by taking every other 
card as arranged in alphabetic order. 

From this table is it easy to determine the average height and average 
weight of freshmen? Is it easy to determine the most common height and 
the most common weight? 

2. Construct Table II as follows: 

(а) By examining Table I the largest and smallest values of height and 
weight are determined. 

(б) A form like that of Table II is prepared by entering in the first column 
all possible values of the variable weight from smallest to largest and in a row 
across the top all possible values of the variable height from smallest to 
largest. Then horizontal and vertical lines are drawn between the values 
of the variates dividing the sheet of paper into cells, one cell for each value of 
weight and each value of height that may be associated with it. 

(c) Beginning with the first entry in Table I, make a score mark in Table II 
in the cell corresponding to this first entry of Table I. That is, make a score 
mark in the cell that is in the column headed 75.0 and in the line headed 
157. Proceed in like manner with each entry of Table I. 

(d) Prepare a final form like that of Table II, entering in each 
cell the number of score marks in the corresponding cell of the table 
obtained in (c). 

(e) Form the right-hand column of totals of entries in each line, and the 
bottom row of totals of entries in each column. See that the total of the 
column of totals and of the line of totals is the same and equal to the total 

number of pairs of entries in Table I. If this is not so, find the mistake and 
correct it. 

3. From Table II is it any easier to find the average height and the 
average weight of freshmen ? Why ? 

Is it easier to determine the most common height and the most common 
weight? Why? 

4. Which is more common, small deviations from the average height or 
large deviations from the average height? Explain how determined. Is 
the same answer true in regard to weight? 

6. Is there any evidence that tall men have a tendency to be heavy and 
short men light? What shows this? 

6. If you were given the height of some freshman, could you, by means 
of this table, predict his weight either exactly or approximately? On what 
considerations do you base your answer? 

7. As shown by Table II, about what weight would you expect a fresh¬ 
man of height 71.0 in. to be? 

8. By inspection of Table III ascertain in which months Seattle has 

e greatest precipitation. In which months the least. 

9. What is the average for each of the two months of greatest precipita¬ 
ting the period 1900-1923 inclusive? 

What is the average for each of the two months of least precipitation? 



18 


.4 FIRST COURSE IN STATISTICAL METHOD 


10. From Table IV determine the average monthly temperature for each 
of the two hottest months of the year and each of the two coldest months 
of the year in Seattle for the period 1900-1923 inclusive. 

11. Table V shows the number of heads and tails that fell in each of 500 
throws of seven dimes. Is it easy to determine from this table what number 
of heads occurred most often? 

12. If the dimes of Ex. 11 had been thrown 5,000 times, would you 
expect the same result as to the number of heads occurring most often? 

13. If the dimes of Ex. 11 had been thrown only 10 times, would you 
expect the same result as to number of heads occurring most often? 

14. Toss 10 coins 500 times, recording the number of heads and tails at 
each throw, and forming a table similar to Table V. 

16. Suppose that at the weather station for some locality we can obtain 
the daily precipitation for each day for 50 years. Suggest methods of 
selecting a sample of some 500 to 600 daily records from which the average 
daily precipitation over the entire 50 years may be obtained with a fair 
degree of accuracy. 

16. A 100-ft. steel tape was used in measuring a distance, which was 
found to be 1,243.7 ft. It was afterwards found that the tape was a 
hundredth of a foot too long. What correction should be made in the 
measured distance? 


CHAPTER II 


TABULATION, FREQUENCY DISTRIBUTION 

1. In order to compare two or more variables or to study the 
changes in a single variable, proper tabulation is of prime impor¬ 
tance. This is not always an easy problem. The easiest way to 
tabulate a set of observations is to record each one as it is made, 
making a column of values in order of occurrence. Such a table 
is that of monthly rainfall, in Table III, in the first chapter. If 
it is desired to show which are the rainiest months of the year 
and which the driest or what amount of monthly rainfall occurs 
most often, the data should be tabulated in different form so as to 
bring out the desired information. For other characteristics it 
might be desirable to use a still different form of table. 

2. Not many hard-and-fast rules can be given as to the form a 
table shall take. Experience is the one thing which gives greatest 
skill in making tabulations. This much may be said; there is' 
usually some one best form to use in order to bring out the char¬ 
acteristics to be studied or to show the comparisons desired. It , 
is the business of the tabulator to find this best form and use it. 

Frequently this is not an easy matter. ^ 

3. Best Form.—First the problem to be solved, the purpose to 
be served by the tabulation, should be formulated and written 
down. Examine the data to see if they furnish the necessary 
information. If not, further information must be obtained 
before tabulation can proceed. 

Next decide whether the data shall all be placed in one table 

or in two or more. A large mass of data may make a single 

table too unwieldy. If possible, it should be easy for the eye to 

follow from one entry to another with which it is to be compared, or 

to follow through a series of entries related to each other. It may 

be that in order to do tills data will have to be split up into several 

tables. Usually it is unwise to try to make one table serve 

for the solution of two or more distinct problems. If two 

series are to be compared, it is desirable to place them in adjacent 
columns. 


19 



20 


A FIRST COURSE IN STATISTICAL METHOD 


The main headings, subheadings, and number of columns must 
be decided upon. By a proper use of main headings and sub¬ 
headings, space may be saved and those things to be compared 
brought near to each other. Thus a larger amount of data may 
frequently be brought into a single table than could otherwise be 
done. This process can be carried too far and the table be made 
too large and confusing. The judgment of the tabulator and the 
purpose to be served will have to govern the matter. 

If several different forms of tabulation are possible, it is well to 
sketch a form which seems to serve the purpose. Then see if 
this form gives a ready solution of the problem. It may soon 
appear that another form will make the solution of the problem 
easier, bring out the meaning of the figures more clearly, and 
make the table more easily interpreted by the reader. The judg¬ 
ment of the tabulator should soon tell him when the best form 
has been found. 

4. Placing on Sheet.—Having decided on the best form to use, 
a sufficient number of figures should be entered to make it possible 
to adjust the column widths so that the table may fit the page 
on which it is to be finally printed. If possible adjust the table 
so it may be placed upright on the page with the top of the 
table at the top of the page. If it is necessary to make the table 
the other way of the sheet, place it so that a quarter right-hand 
(clockwise) turn will make the table right side up. This places 
the top of the table at the left-hand edge of the sheet. In the 
same way, if a column heading cannot be written across the 
column, it should be placed so that a quarter right-hand turn will 
make the printing right side up. 

5. Title and Headings—The title of the table should be as con¬ 
cise as possible. It must, however, make perfectly clear what 
material is shown. It should be unambiguous. The same remarks 
apply to column headings. Somewhere, either in the title or 
column heading, the unit of measurement used should be stated. 
Footnotes may be used to show the source of data or to make 
clear the meaning or content of any column or of the whole table. 
Avoid too many footnotes. It is better that each heading be 

completely self-explanatory if possible. 

It is best that titles, column headings, and figures be all printed 
by hand. Ordinary script is not permissible. Different size 
letters can then be used according to the importance of the head¬ 
ing. Headings of the same degree of importance should be printed 



TABULATION , FREQUENCY DISTRIBUTION 21 

all in the same size of letter. In some cases typewriting may 
be used. If so, a typewriter must be chosen that gives neat, 
accurate, uniform work. If the tables are to be reproduced by 
some photographic printing process, a typewriter ink must be 
used that will photograph. 

6. Neatness and accuracy are prime essentials. Failure in 
either respect makes the work of little value. A single mistake 
makes the entire tabulation untrustworthy. Transcription of 
figures and all computations should be checked and rechecked. 

7. Illustrations.—The following tables illustrate some of the 
points mentioned above. Suppose the data are the population 
and area of each state of the United States. If the purpose is 
merely to record the facts so that they may be available for 
reference, the states may be arranged in alphabetical order, as 
shown in Table VI. 


Table VI.—Population and Area of the United States 

Census of 1920 


State 

Population 

Area, in 
square miles 

Alabama. 

2,348,174 
334,162 
1,752,204 
3,426,861 
939,629 
1,380,631 
etc. 

51,998 

113,956 

53,335 

158,297 

103,948 

4,965 

etc. 

Arizona. 

Arkansas. 

California. 

Colorado. 

Connecticut. 

etc. 


The purpose of this table would be served as well if the last 
two columns were interchanged. 

If it is desired to compare poluations, to show which states 
have the large populations and which the small, the states should 
be arranged in column so that the column of populations will give 
the values in order of size. It may begin with the largest and pro¬ 
ceed to the smallest or vice versa. The areas do not enter into 
consideration and need not be tabulated. Table VII shows the 
form, beginning with the state of largest population. 

If it is desired to show populations by geographic distribution, 
the column of states may begin with those states of the New 
England division, then the Middle Atlantic division, and so on 










22 


A FIRST COURSE IN STATISTICAL METHOD 


across the country to the Pacific division. It will be well to show 
the total for each geographic division. Table VIII shows the 
form. 


Table VII.—Population of the United States 

Census of 1920 


State 

Population 

New York. 

10,385,227 

8,720,017 

6,485,280 

5,759,394 

Pennsylvania. 

Illinois. 

Ohio. 

Texas. 

4,663,228 

Massachusetts. 

3,852,356 

etc. 

etc. 


Similar tables classifying areas only, instead of populations, 
may be desired. In such case population figures would not be 
tabulated. 


Table VIII.—Population of the United States by Geographic 

Distribution 


Census of 1920 


State 

Population 


768,014 

443,083 

352,428 

3,852,356 

1,380,631 

604,397 

Mpw Ha mn^hirp ... 


. 




7,400,909 

etc. 

etc. 


8. Derived Data—Other tables would be derived for prob¬ 
lems involving both population and area. For instance, sup¬ 
pose it is desired to make a study of density of population, the 
population per square mile. The densities may be derived for 



















TABULATION , FREQUENCY DISTRIBUTION 


23 


each state from the original population and area data. If the 
table is merely for reference, the states will be tabulated in 
alphabetic order as was done in Table VI. The states may be 
arranged by geographic divisions, according to numerical value 
of the density, by size of population, or by magnitude of area, 
according to what particular feature it is desired to bring out. 

Table IX shows density of population with the states arranged 
according to size of population. 


Table IX. —Population, Area, and Density of Population of the 

United States 


Census of 1920 


State 

Population 

Area in square 
miles 

Density of 
population, 
(number per 
square mile) 

New York. 

10,385,227 

49,204 

211.1 

Pennsylvania.... 

8,720,017 

45,126 

193.5 

Illinois. 

6,485,280 

56,665 

114.4 

Ohio. 

5,759,394 

41,040 

140.3 

Texas. 

4,663,22S 

265,896 

17 5 

Massachusetts. 

3,852,356 

8,266 

466.0 

etc. 

etc. 

etc. 

etc. 



The contrast in density for Texas and Massachusetts is very 

noticeable, especially since Texas has a greater population than 

has Massachusetts. The reason is at once seen by examining the 

column of areas, Texas has more than 30 times the area of 
Massachusetts. 


9. Time and Place Variation.—Population, area, density, 
e variables in Tables VI to IX, vary from one place to another 
place, time remaining constant. The same elements, population 
area, density, for the United States or for any one state might 
ave been tabulated for each census year. The variables then 
vary from one time to another with place remaining constant. 
° f tI ? es ® elerneQts » sa y Population, may be tabulated in one 

NameTnf °tl Tf ° r h "f?* !al Variable and “ P la “ variable. 
Barnes of the states would be placed in column, and a column 

made for each date. The form is shown in Table X. 








24 


A FIRST COURSE IN STATISTICAL METHOD 


Table X.—Population of the United States at Each Census from 

1900 to 1920 

Source: Report of the Bureau of the Census 


State 1900 1910 1920 


Alabama. 1,828,697 2,138,093 2,348,174 

Arizona. 122,931 204,354 334,162 

Arkansas. 1,311,564 1,574,449 1,752,204 

California. 1,485,053 2,377,549 3,426,861 

Colorado. 539,700 799,024 939,629 

Connecticut. 908,420 1,114,756 1,380,631 

etc. etc. etc. etc. 


All three elements might be thus tabulated as both historical 
variable and place variable. In this case each census date would 
constitute a subheading with three columns under it, one for each 
of three elements, population, area, and density. The table is 
now likely to become large and unwieldy. If, however, it is 
desirable to have the data tabulated in this form, the sheet may be 
made large and then folded in to fit the page size of the report. 
Many excellent examples of tabulation of place variables, time 
variables, and combinations of both may be found in the “Statis¬ 
tical Abstract of the United States,” Department of Commerce, 
and in many other publications from the Government Printing 

Office. 

10. Quality Variation.—Besides the time variable and the 
place variable, there is what may be called the quality variable 
that is sometimes studied. The variable is classified according 
to the presence or absence of certain characteristics. Thus a 
certain population might be studied as to the association of 
blindness and deaf-mutism. The tabulation would take the form 

of Table XI. 


Table XI—Blindness and Deaf-mutism of 1,163 Paupers 

















TABULATION , FREQUENCY DISTRIBUTION 


25 


Such studies lead to the theory of attributes or theory of 
contingency. This theory is well treated in Yule’s “Introduction 
to the Theory of Statistics,” but there will be no occasion to take 
it up in this book. 

11. Frequency Tables.—Another very important form of 
tabulation is that of the frequency table, which has been defined 
in Chap. I. This will now be studied in more detail. 

12. Classifications.—Turn to Table I of weights and heights 
of university freshmen and proceed to make a frequency table of 
weights containing about 30 classes. By examination of the 
weights, it is seen that they run from 100 to 216 lb., a range of 
217 — 100 = 117 lb. Thirty is contained in 117 between three 
and four times, nearly four times. Use a class interval of 4 lb. 
The first class, then, may contain all those weights from 97 to 100 
lb., inclusive, 98 to 101, 99 to 102, or 100 to 103. Considerations 
taken up later would determine which to use. In the table as 
formed, the first class includes all those weights from 97 to 100 lb., 
inclusive, and so on to the thirtieth class, which contains all 
weights from 213 to 216 lb. inclusive. 

13. The finished table, Table XII, has three columns. The 
first column contains the weight classes, or size of items, desig¬ 
nated by the letter X. The second column displays the 
frequencies, or number of items in each class, designated by the 
letter/. These two columns are made up directly from the orig¬ 
inal measurements as recorded in Table I. The third column, 
cumulative frequency, is obtained from the second column by 

successive additions of frequencies so that the last entry in this 
column equals the total frequency. 

14. Class Limits and Class Boundaries.—The pairs of numbers 
written m the column of classes are the upper and lower class 
imits. Halfway between the upper limit of one class and the] 
ower limit of the next class is a class boundary . The upper 
boundary of one class is the lower boundary of the next class. For 
instance, the upper boundary of the third class in Table XII is 
XU8.5, which is the lower boundary of the fourth class. The lower 

i u 96 ' 5 ““ “» upp » t <■» 

When the men were weighed, their weights were recorded 

vomd ’ though the scaies might have been read to 

than Zf a “ 0UnCe ' ThUS aU Weights more than 103.5 and less 
tnan 104.5 were recorded as 104. The third class in Table Xll 



26 


A FIRST COURSE IN STATISTICAL METHOD 


Table XII.— Weights of Freshman Men 


In pounds 


Weight classes X 

Frequency f 

C umulative 
frequency F 

97-100 

1 

1 

101-104 

2 

3 

105-108 

4 

7 

109-112 

15 

22 

113-116 

12 

34 

117-120 

22 

56 

121-124 

34 

90 

125-128 

44 

134 

129-132 

62 

196 

133-136 

57 

253 

137-140 

56 

309 

141-144 

61 

370 

145-148 

55 

425 

149-152 

42 

467 

153-156 

40 

507 

157-160 

25 

532 

161-164 

30 

562 

165-168 

14 

576 

169-172 

15 

591 

173-176 

13 

604 

177-180 

10 

614 

181-184 

4 

618 

185-188 

2 

620 

189-192 

2 

622 

193-196 

3 

625 

197-200 

1 

626 

201-204 

0 

626 

205-208 

2 

628 

209-212 

0 

628 

213-216 

1 

629 

Total. 

— 

629 



then includes all those weights which were more than 104.5 and 
less than 108.5, the class boundaries. The recorded weights in 
this class were only from 105 to 108, inclusive, the class limits. 






TABULATION , FREQUENCY DISTRIBUTION 


27 


The difference between the upper limit of one class and the lower 
limit of the next class is always the smallest division in the measure¬ 
ments used, in this case, 1 lb. The distinction between class 
boundaries and class litnits is an important one in plotting graphs 
and in determining various division points. 

If, in the process of weighing, there happened to be several 
weights that seemed to be just 103.5, half of them may be 
recorded as 104 and half as 103. Or, if there is an odd number, 
the computers’ rule may be followed for the odd one which drops 
the 0.5 or adds on another 0.5 according to which gives an even 
number. In the long run, this rule results in dropping 0.5 about 
the same number of times that an extra 0.5 is added on, thus 
making the errors compensating. Or, a coin may be flipped to 
make the decision. The making of the frequency table is not 
concerned with this matter, since the measurements as recorded 
are the ones used. 

16. Forming the Table.—Having decided on the class interval 
and the limits of the first class, complete the first column showing 
the class limits for all classes in order. Go through Table I of 
weights and check off each weight by a score mark opposite the 
class into which it falls. "When this is done, add up the score 
marks for each class and enter the sum in the frequency column 
opposite the class to which it belongs. Total the frequencies. 
Form the cumulative frequency column. Check the last entry on 
the total of frequencies, and check both on the total number of 
individuals measured in Table I. 

16. Distribution of Frequencies.—Study the frequency 

column. What do you note in regard to change of frequency? 

Ho heavy-weights, light-weights, or middle-weights occur most 

frequently? Could this have been readily seen from Table I? 

Was it to have been expected? Is there regularity in change of 
frequency? 

tuIV E fi eC< v°T f Change of Class Interval.—In the same manner 

1 * able XaI was constructed, another frequency table, of the 

m With a larger cIa « s interval giving fewer classes. 

abl . e > Table XIII, has eleven classes with a class interval of 
n r 5 m Table XII > ifc is seen in this table that light-weights 

In TiTTrT- ^ 0CCUr leSS frequentl y than middle-weights. 
hmL • XInt Sh ° Uld be n0ticed that the frequencies are some-( 

incre«^ CreaS T S t nd SOmetimes decreasing, but on the whole) 
increasing up to about the middle of the table. Then they are 



28 


A FIRST COURSE IN STATISTICAL METHOD 



sometimes decreasing and sometimes increasing, but on the whole 
decreasing to the end of the table. In Table XIII, one of the 
most noticeable things is that these irregularities have been wiped 
out. The frequencies are everywhere increasing up to a maxi¬ 
mum in the fourth class, and then everywhere decreasing to the 
end of the table. Increasing the class interval tends to smooth 
out irregularities in variation of frequencies. This fact will be 
very noticeable when graphs of these two sets of frequencies are 
made in a later chapter. Usually it is permissible to increase the 
class interval until a relatively smooth variation of frequencies is 
obtained. In some cases, however, certain considerations may 
demand the retention of the original measurements. Increasing 
the class interval should not be carried too far. Manifestly a 
class interval large enough to give only two classes would tell very 
little with regard to distribution of frequencies. 

18. Frequency Table of Heights—Table XIV is a frequency 
table of the heights in inches of the 629 freshmen given in Table I 
whose weights have just been tabulated. The heights were 
recorded to the nearest half-inch. The range was from 60.5 to 
75.0 in., or 15.0 in., which is 30 units of measurement in half¬ 
inches. It being desired in a later chapter to compare the dis¬ 
tribution of heights with that of weights, it is convenient to have 
about the same number of classes in Table XIV as in Table XIII. 
A class interval of 1.5 in. gives 11 classes, the same number as in 


Table XIII. 


Table XIII. —Weights of Freshman Men 

In pounds 


Weight classes, X 


t v i 


Total 


Frequency, / 


629 


Cumulative 
frequency F 


1 

100-110 

15 

111-121 

43 

122-132 

138 

133-143 

162 

144-154 

129 

155-165 

82 

166-176 

35 

177-187 

16 

188-198 

5 

199-209 

3 

210-220 

1 


15 

58 

196 

358 

487 

569 

604 

620 

625 

628 

629 




TABULATION , FREQUENCY DISTRIBUTION 


29 


Table XIV.— Heights of Freshman Men 


In inches 


Height classes, X 

Frequency, / 

Cumulative 
frequency F 

60.0-61.0 

3 

3 

61.5-62.5 

8 

11 

63.0-64.0 

33 

44 

64.5-65.5 

51 

95 

66.0-67.0 

115 

210 

67.5-68.5 

156 

366 

69.6-70.0 

148 

514 

70.5-71.5 

64 

578 

72.0-73.0 

43 

621 

73.5-74.5 

5 

626 

75.0-76.0 

3 

629 

Total. 

629 



19. Comparison of Frequency Distributions.—Examination of 
the frequency column shows an increase of frequencies with increase 
of height up to a certain point and then a decrease to the end of 
the table. This was also true of the weights. It is noticeable,] 
however, that there is greater symmetry of distribution in heights! 
than m weights. The number of small heights is about the same' 
as the great heights. The number of small weights is consider¬ 
ably more than the number of great weights. The class of great¬ 
est height frequency, 67.5 to 68.5, is at the middle of the table. 
Ihe class of greatest weight frequency, 133 to 143, is nearer the 
eginmng of the table. In the next chapter these facts will be 
shown at a glance of the eye by means of frequency graphs, 
inere seems to be a greater scattering of weights as compared 
w th the average weight than there is of heights as compared 
with average height. In other words, heights tend to concen- 

C ™tr„t K ^ r rage he ' ght m ° re than wei 6 hts tend t0 con¬ 
tend n , th ? aVerage weight ' Numerical measures of this 
tendency will be devised in the chapter on dispersion. 

distrit^— D ' stribution -—Another illustration of frequency 
^tnbution is shown by the results of pure chance taken from 

numher’ tr tabulat L mg “umber of heads as size of item and the 

Table XV isTbTained ° CCUrred “ ^ 5 °° thr0TO 83 fre W’ 





30 


A FIRST COURSE IN STATISTICAL METHOD 


The similarity of the frequency distribution of Table XV to 
that of Table XIV is very striking. Its similarity to that of 
Table XII is not so great. This indicates that freshman heights 
follow more closely a symmetric distribution like that due to 
pure chance in throwing coins than do their weights. It would 
seem, then, that there is an intimate connection between the laws 
of probability governing the number of heads falling in throwing 
coins and the distribution of freshman heights. 


Table XV.— Number of Heads Falling in 500 Random Throws of 

Seven Dimes 


Number of heads, size of 
item, A' 

Number of times occur¬ 
ring, frequency, / 

Per cent of total 

0 

1 

0.2 

1 

31 

6.2 

2 

76 

15.2 

3 

146 

29.2 

4 

144 

28.8 

5 

76 

15.2 

6 

23 

4.6 

7 

3 

0.6 


500 

100.0 


The next point of interest arising at once is to discover how 
closely the frequency distribution of the 500 throws of seven 
dimes follows the theoretical distribution called for by the laws 
of probability. 

21. Probability Theorems.—Following are the four funda¬ 
mental theorems in probability. 1 

Theorem I —The probability that all of a set of independent 

events will occur is the product of the separate probabilities of 
the occurrence of each of the single events. This is com V ound 


probability. 

The probability that any given event, in a set of mutually 
exclusive events, will happed, and all others fail, is called partial 

or relative probability . - 

Theorem II.— The probability that any one whatever of a set ot 

mutually exclusive events will occur, and all the others fail, is the 

» The proof of these laws may be found in Appendix C. ^ 3 ' -5" 

H* ^ 

.j J > - i: ’ 


n Appendix C. ^ 3 *S 

ii, 

J \ * i** 


i 


TABULATION , FREQUENCY DISTRIBUTION 


31 


sum of the partial probabilities of the single events. This is 
total probability. 

Theorem III. —If the probability that an event will occur in a 
single trial is p, and the probability that it will fail is q, then 
the probability that it will occur exactly r times, no more no less, 
in n trials is n C r p r q n ~ r (( p + q) always equals 1). n C r is the 
number of combinations of n things taken r at a time (see •» 
Appendix B). * 

Theorem IV. —If the probability that an event will occur in a 
single trial is p, and the probability that it will fail is q , then the 
probability that it will occur at least r times in n trials (r times or 
more) is the sum of the first (?i — r + 1) terms in the expansion 
of (p + q) n , that is, p n + n C i p"' 1 q + „C 2 p n - 2 ? 2 + . . . -f 
nC T p r q n ~ r times. 

22. Theoretical Frequencies in Tossing of Coins.—In a single 
toss of a single dime the probability of its falling tails is p = y 2 . 
The probability of its falling heads, that is, failing to fall tails, is 
q = In tossing one dime seven times, or seven dimes once, 
the probability of all tails, no heads, is p 7 = (y) 7 = y l28l by 
Theorem I. > 


The probability that tails will occur six times and only six 
times, that is, that heads will occur once and only once, is i(\p 6 q l 

~ 7 (M) 6 (H) = K 28 . by Theorem III. Similarly, the prob¬ 
ability of getting exactly 5 tails and 2 heads is 


,, _ ’W = 21 (M)‘(M) 2 = %8. 

4 tails and 3 heads is 

iW = 35 (KV(M) s = %8. 

o tails and 4 heads is 

= 35(K)’(M)‘ = %8, 

* tails and 5 heads is 

It,'. f’PV = 21 (W W = %8, 

1 tail and 6 heads is 





a , , ***' = 7 (M) (K) c = K28. 

and of gettmg no tails and all heads is 

(J£) 7 = K28- 

J^ eorem IV, the probability of tails occurring at least no 
jmes V - 0) ’ ^ other wor <*s, the probability of heads occurring 
at most seven times, is the sum of all of the (» + 1) = 8 terms of 

will !? anS1 ° n (P + ^) n = (M + y ) 7 . It is certain that heads 

ui occur seven times at most. The sum, therefore, of the eight 



32 


A FIRST COURSE IN STATISTICAL METHOD 


probabilities of the preceding paragraph should be 1, indicating 
certainty. 

04 + Vi) 1 = (y 2 y + 7 mva) + 21 (H) s (M) 2 + 35 (K)w 

+ 35 (H) 3 (K) 4 + 21 (y 2 )VA) 5 + 7 (M)(H) 6 + (M) 7 

= K 28 + K 28 + 2 K28 + 28 + 3 /^28 4- 2 4l28 + K 28 + 

K 28 

= 12 ?i28 = 1. 

Tabulating these separate probabilities Table XVI results. 


Table XVI. —Probabilities of Occurrence of Different Combinations 
of Heads and Tails in Seven Throws of a Coin or in One Throw 

of Seven Coins 


Heads 

Tails 

Probability of 
occurrence 

Per cent of total 

0 

7 

K 28 

0.78 

1 

6 

V\. 28 

5.47 

2 

5 

2 K28 

16.41 

3 

4 

% 8 

27.34 

4 

3 

3 ^28 

27.34 

5 

2 

2 K28 

16.41 

6 

1 

K 28 

5.47 

7 

0 

K 28 

0.78 

Trtt.nl . 


12 K28 = 1 

100.00 




These eight probabilities are then proportional, respectively, to 
the numbers 1, 7, 21, 35, 35, 21, 7,1, or the percentages of the last 
column of Table XVI. In a very large number of trials the 
theoretical distribution of number of heads calls for frequencies 
proportional to these numbers. 

23. Comparison with Actual Trial.—If, for example, the 500 
trials had followed the law of probabilities as closely as possible, 
the number of times 0 heads, seven tails occurred would have 
been 0.78 per cent of 500, or 4. The number of times one head, 
six tails occurred would have been 5.47 per cent of 500, or 27. 
Compute the theoretical frequency for each of the eight possible 
combinations of heads and tails. Table XVII displays te 
theoretical frequency the actual frequency / that occur ^ e ^ * n 
throwing seven dimes 500 times; and the correction to be added to 

/ to give/'. 








TABULATION , FREQUENCY DISTRIBUTION 


33 


While 500 seems like a fairly large number, it is by no means 
what could be called a very large number. The results of the 500 
trials are fairly close to the theoretical results called for from a 
very large number of trials, as shown in Table XVII. 


Table XVII.— Comparison of Theoretical Frequency with Actual 
Frequency from Throwing Seven Dimes 500 Times 


Number of heads 

X 

Theoretical 

frequency 

S' 

Actual fre¬ 
quency (from 
Table XV) 

/ 

Correction to be 
added to / to 
give J' 

0 

4 

1 

+3 

1 

27 

31 

-4 

2 

82 

76 

+6 

3 

137 

146 

-9 

4 

137 

144 

—7 

5 

82 

76 

+6 

6 

27 

23 

+4 

7 

4 

3 

+1 

Totals. 

500 

500 

20-20 


24. Binomial Distribution.—The theoretical distribution of 
frequencies derived from the expansion of + J^) 7 is perfectly 
symmetrical. There are relatively few of the extreme values of 
the variable, and relatively many of the medium-sized values, 
the maximum number occurring at exactly the middle of the table 
of frequencies. This distribution of frequencies is called the 
symmetric binomial distribution. It will be further discussed in 
e next chapter. If p = y z and q = and the frequencies of a 
attribution are proportional to the terms of the expansion of 

p + £)* _ (!,£ _|_ the df str ib ut i on of frequencies will no 
onger be symmetrical. This will always be true if p * q, and n 
is not too large. 

26. Compos™ with Freshman Heights.-Now examine the 
Th U k • ° f freshman heights as displayed in Table XIV. 
" be “ g H . classes , assume 11 different possible heights, 

series y, T? he K^ P ° lntS ° f the classes > constituting a discrete 
pine • * , differs from the next by one class interval. This 

men* m< p rVa1, 15 in '’ may be re g arded 85 the unit of measure- 
• -tor a comparable symmetrical binomial distribution, 



34 


A FIRST COURSE IN STATISTICAL METHOD 


the frequencies will be proportional to the 11 terms of the 
expansion of (Yi -f 14) 10 - 



1 10 45 120 210 252 210 

1,024 + 1,024 + 1,024 + 1,024 + 1,024 + 1,024 + 1.024 

120 45 10 1 = 1,024 

+ 1,024 + 1,024 + 1,024 + 1,024 1,024 


The frequencies of the symmetric binomial distribution will, 
therefore, be proportional to the numbers in Table XVIII. 


Table XVIII 


Number 

Per cent of 1,024 

1 

0.1 

10 

1.0 

45 

4.4 

120 

11.7 

210 

20.5 

252 

24.6 

210 

20.5 

120 

11.7 

45 

4.4 

10 

1.0 

1 

0.1 

Totals. 1,024 

100.0 


The theoretical frequency to the nearest integer for the first 
class is 0.1 per cent of 329, or 1; for the second class is 1.0 per cent 
of 629, or 6; for the third class is 4.4 per cent of 629, or 28; and so 


on. 

Table XIX shows the values of /', f, and the corrections, as 
was shown in Table XVII for the results of throwing the coins. 

From an examination of Table XIX it is seen that the 
distribution of heights of the 629 freshmen is fairly close to the 



TABULATION , FREQUENCY DISTRIBUTION 


35 


Table XIX. —Comparison of Frequency Distribution of Freshman 
Heights and Symmetric Binomial Distribution 


Midpoints of 
Classes 

X 

Theoretical 

frequency 

r 

Frequency of 
freshman 
heights 

/ 

Correction to be 
added to / to give 

r 

60.5 

i 

3 

- 2 

62.0 

6 

8 

- 2 

63.5 

28 

33 

- 5 

65.0 

73 

51 

+22 

66.5 

129 

115 

+ 14 

68.0 

155 

156 

- 1 

69.5 

129 

14S 

-19 

71.0 

73 

64 

+ 9 

72.5 

2 S 

43 

-i5 

74.0 

6 

5 

+ 1 

75.5 

1 

3 

- 2 

Totals. 

629 

629 

46-46 


theoretical, chance distribution resulting from throwing 10 coins 
a very large number of times. 

26. Non-symmetric Freshman Weights.—As was previously 
noted, the distribution of freshman weights was not symmetrical. 
Perhaps values of p and q (p ^ q ) could be found such that the terms 
of the binomial expansion of (p+<?)" would be closely proportional 
to the frequencies of freshman weights. This involves a problem 
in curve fitting which will not be treated here. 

27. Experience shows that very many of the variables dealt 
with by the statistician have frequency distributions approxi¬ 
mating the symmetric binomial distribution. Certain statis- 
ttcal constants are derived on the assumption of such distribution. 

8. Double Entry.—If two variable characteristics of the same 
set of things are measured, the values of the variables may be 

i U a \ S u 35 both together in such a manner as to 

w whether a change in one variable tends to accompany a 

rresfKjndmg change in the other or not. The sheet is ruled 

! nt ,° ce ®* ma hing a table of double entry. One 

II n f v “ tabulated hor i z °ntally and the other vertically. Table 

he III.T . j ! ap ' is a tabIe °f this hind. The method can best 
lustrated by an example. Suppose the lengths and breadths 



36 


A FIRST COURSE IN STATISTICAL METHOD 


Table XX. —Lengths and Breadths of 60 Leaves Selected at Random 

from a Single Tree 


All measures are to the nearest millimeter 


Length 

Breadth 

Length 

Breadth 

Length 

Breadth 

l 

b 

l j 

b 

l 

b 

21 

16 

33 

27 

40 

26 

21 

18 

33 

28 

41 

27 

23 

19 

34 

28 

41 

25 

24 

22 

34 

31 

41 

25 

25 

19 

34 

30 

42 

30 

25 

22 

35 

25 

42 

29 

26 

21 

35 

23 

43 

29 

26 

21 

35 

23 

43 

28 

27 

24 

35 

25 

44 

31 

27 

25 

36 

27 

44 

30 

27 

25 

36 

24 

44 

31 

28 

27 

37 

26 

45 

25 

30 

20 

37 

28 

45 

27 

30 

22 

37 

26 

45 

27 

30 

22 

37 

2 S 

46 

26 

31 

23 

38 

30 

46 

28 

31 

25 

38 

29 

47 

30 

31 

25 

38 

29 

48 

33 

31 

25 

39 

32 

50 

32 

33 

26 

40 

• 22 

53 

34 


of 60 leaves, selected at random from a tree, are measured and 
tabulated, the results being shown in Table XX. They are 
tabulated according to length. It may be noticed that with 
increase of length there is not always an increase of breadth, 
though, in general, the leaves get wider as they get longer. 

29. Construction of Table—Draw up a diagram of cells for a 
table as suggested above, tabulating lengths vertically and 
breadths horizontally. This provides a column of cells for 
each possible breadth from the smallest that occurs to the largest. 
Similarly, there is a line of cells for each possible length from the 
smallest that occurs to the largest. Call this Table XXI. 
Using Table XX for each pair of length and breadth ( l , b), make 
a check mark in Table XXI in that cell which is in the line l and 
in column b . The method is readily seen by going through Table 
XX and following the results in Table XXI. 











TABULATION , FREQUENCY DISTRIBUTION 


37 


Table XXI.— Lengths and Breadths of 60 Leaves Selected at Random 

from a Single Tree 

All measurements to the nearest millimeter 


21 22 23 24 25 26 27 28 29 30 3132 33 34 Totals 












































3S 


A FIRST COURSE IN STATISTICAL METHOD 


Table XXII. —Lengths and Breadths of 60 Leaves Selected at 

Random from a Single Tree 


All measurements to the nearest millimeter 


b 

1 \ 

1G 

17 

IS 

19 

20 

1 

i 

22 

23 

a 

25 

a 

27 

2 S 

29 

1 

31 

32 

a 


Totals 

21 

1 

1 

1 
















1 

2 

22 

II 1 l 1 ll 


1 


23 | | | 1 | 




1 i 

24 | 




1 



1 




1 



1 

25 | 



1 

ran 

■ 

1 




1 



2 

26 | 1 | 1 |2| 

III 1 i 

1 2 

27 | | || | ll 21 | Ml 1 1 3 









































TABULATION , FREQUENCY DISTRIBUTION 


39 


Table XXI might have been prepared directly when the orig¬ 
inal measurements were made, not making Table XX at all. 

From this table prepare another table exactly like it, except 
that the entry in each cell is the sum of the score marks in Table 
XXI. This is Table XXII. 

30. Scatter Diagram.—The entries in Table XXII tend to 
follow the diagonal down to the right, there being no entries in 
the lower left-hand part, nor in the upper right-hand part of the 
table. These facts show that short leaves are in general narrow 
and long leaves are wide, and that no long leaves are narrow or 
wide leaves short. It is thus seen at a glance of the eye that there 
is a strong tendency for variable b, breadths, to increase with 
increase of variable Z, lengths. If the entries had followed the 
other diagonal, it would have shown that as one variable increased, 
the other decreased. 

In the chapter on correlation this table will be used to illustrate 
the method of getting a measure of the tendency of one varia¬ 
ble to change along with another. Such a table as Table XXII is 
called a scatter diagram or correlation table. 

31. Frequency Tables.—A frequency table for each variable 
can now be easily drawn up from Table XXII. Having decided 
on the class limits for each frequency table, draw, in Table XXII, 
heavy vertical lines at the class boundaries of the horizontal 
variable, and heavy horizontal lines at the class boundaries of 
the vertical variable. The sum of the entries between heavy 
lines in the vertical column of totals gives the frequency column 
for the vertical variable. In the same way, the sum of entries 
between heavy lines in the horizontal line of totals gives the 
frequency column for the horizontal variable. Table XXIII and 
Table XXIV are thus derived from Table XXII. 


Table XXIII.— Frequency Table of Leaf Lengths 


Classes (size of item in millimeters) 

Frequency 

l 

/ 

20-24 

4 

25-29 

8 

30-34 

13 

35-39 

14 

40-44 

12 

45-49 

7 

50-54 

2 

lOuiL, 

N = 60 




40 


A FIRST COURSE IN STATISTICAL METHOD 


Table XXIV.— Frequency Table of Leaf Breadths 


Classes (size of item in millimeters) 

b 

Frequency 

/ 

15-17 

1 

18-20 

4 

21-23 

10 

24-26 

17 

27-29 

16 

30-32 

10 

33-35 

2 

Total .. 

N = 60 




Exercises 


1 . Taking Table II, draw horizontal and vertical lines separating the 
classes of Table XIII and Table XIV respectively. Check the frequency 
columns of Tables XIII and XIV. 

2. Make a frequency table of mean monthly temperatures at Seattle as 
recorded in Table IV. Use a 5° class interval. Let the class limits for the 
first class be 30.0-34.9°. 

3. Make a frequency table of monthly precipitation in inches at Seattle 
as recorded in Table III. Use a 1-in. class interval. Let the class limits 
for the first class be 0.00-0.99 in. 

4. From Table V construct a frequency table for all possible numbers of 
heads that could occur. Does the result check Table XV? 

6 Make a scatter diagram of Seattle mean monthly temperature and 
precipitation as shown in Table III and Table IV. Make the horizontal 
variable precipitation in inches, and the vertical variable mean monthly 

temperature in degrees Fahrenheit. . 

6 . Make a frequency table of the mean monthly temperature m Seattle 


as shown in Table IV. Use a 2° class interval. 

7 Using Table XXII and a class interval of 5 mm., construct hve diner- 
ent frequency tables of leaf lengths. One of these is the same as Table 
XXIII. If it is desirable to have the frequencies increase rather steadily 
up to a maximum and then rather steadily decrease, which of the five tables 

is best to use? . 

8 Using Table XXII and a class interval of 3 mm., construct three ditle - 

ent frequency tables of leaf breadths. If it is desirable to have the fro- 
quencies increase rather steadily up to a maximum and then rather steadily 

decrease, which of the three tables is best to use? 

9 Using Table XXII, construct two frequency tables of leaf lengths, one 

with'a class interval of 2 mm., and the other with a class interval,° f 6 
In each table let the lower limit of the first class be 20 mm. In which table 
is there greater regularity of increase and decrease of frequencies f 






TABULATION , FREQUENCY DISTRIBUTION 


41 


10. From the table constructed in Ex. 14, Chap. I, make a frequency table 
of number of heads similar to Table XV. 

11. Expand (H + M) 10 and so determine the theoretical frequencies for 
the number of heads in 500 tosses of 10 coins each. 

12. Make a table like Table XVII, comparing the actual results of Ex. 
10 with the theoretical results of Ex. 11. 

13. Obtain from the 1920 census reports, or elsewhere, the population of 
each of the states and their areas. Tabulate the data with reference to size 
of population, as in Table VII. 

14. Obtain the area of each of the states and tabulate the data according 
to size of the states. 

16. From the data of the tables in Exs. 13 and 14 complete Table IX for 
the entire country. 

16. Rearrange the table of Ex. 15 according to density of population, 
beginning with the state having the greatest density. 

17. From census reports, or elsewhere, obtain the population of each of 
the states for 1910 and for 1920. Compute the per cent increase for each 
state. Tabulate with respect to per cent increase, beginning with the state 
having the greatest per cent increase. 

18. From a tree select a single branch containing about 300 full-grown 
leaves. Try to select a branch so that the leaves on it are representative of 
the entire universe of leaves on the tree. Measure the length and breadth 
of each leaf, recording in the manner of Table XXI. From this record com¬ 
plete a table in the form of Table XXII. 

19. From the table of Ex. IS construct a frequency table of leaf lengths 
having about 10 or 12 classes. 

20. From the table of Ex. 18 construct a frequency table of leaf breadths 
having about 10 or 12 classes. 



CHAPTER III 


GRAPHICAL REPRESENTATION, FREQUENCY GRAPHS 

A graphical representation of statistical data is variously called 
graph , chart, or diagram. There are many forms. Several very 
good books on this subject have recently been published. 1 
Some of the commoner forms will be considered in this chapter. 

1. Object of a Graph or Chart.—The object of a graph or chart 
is to show to the eye the changes of a variable so as readily to 
compare one variable with another. By means of the graph or 
chart a comprehensive view of quantitative data is obtained which 
is much more easily remembered than are the tabulated numerical 
values. For this reason graphs are very valuable in presenting 
statistical data to the general public in lectures, publications, 
window displays, or other exhibits. Frequently relations are 
easily discovered by graphs which would not be readily seen from 
the figures themselves. 

This makes the graph of value to the business executive, the 
engineer, and the research student in nearly all lines. Study of 
graphs of time variables reveals what has happened in the past 
and frequently enables one to predict what will occur in the future 
if the same influences are continued. This is of great value to 
the financier, the economist, the executive, and many others. 
The remarks in Chap. I in regard to title, arrangement, and 
accuracy should be reviewed before making a finished graph. 

2. Bar Diagram—The simplest form of statistical graph is 
the bar diagram. The numbers in a table of values of a variable 
are represented by horizontal or by vertical lines or bars whose 
lengths, measured to some scale, are proportional to the numbers 

i “Charts and Graphs,” Karl G. Karsten, 724 pp., published by 
Prentice-Hall. “How to Make and Use Graphic Charts,” 539 pp., and 
“Graphic Charts in Business,” 250 pp., both by Allan C. Haskell, and 
published by Codex Book Company. “Graphical Methods,” William C, 
Marshall, 253 pp.. published by McGraw-Hill Book Company, Inc. Tie 
last two contain bibliographies of books, pamphlets, and articles relating 
to the subject. “The Statistical Atlas of the United States ” contains many 
forms not treated in this chapter. 


42 



GRAPHICAL REPRESENTATION , FREQUENCY GRAPHS 43 


represented. This is illustrated by the data of Table XXV, 
Fig. 1 and Fig. 2. 

Figure 1 shows the horizontal representation of the data and 
Fig. 2 the vertical-bar representation. Which to use may usu¬ 
ally be left to the judgment. It depends on the quantity of data, 


Hundred', of Millions of Dollars 



Fig. 1. —Savings banks deposits in the United States. 



Fig. 2. —Savings banks deposits in the United States. 


the scale to be used, and the size and shape of the sheet. For a 
simple sequence of numbers, such as in Table XXV, one way 
seems to show the changes in the variable about as well as the 
other. For public displays, horizontal bars are very frequently 



44 


A FIRST COURSE IN STATISTICAL METHOD 


Table XXV. —Savings-banks Deposits in United States 


Year 

Deposits 

1880 

S 819,106,973 

1890 

1,524,844,506 

1900 

2,449,547,885 

1910 

4,070,486,246 

1920 

6,536,470,000 


used. For office study, vertical bars, or graphical forms, to be 
shown later, are very common. 

If the vertical bars, in this illustration, had been placed as 
close together as were the horizontal bars, the figure would not 
have been well proportioned. Spacing between the bars should be 
uniform, as should also be the width of bars. Width of bars, 
spacing between the bars, and a proper scale should be deter¬ 
mined from a study of the data and the size and shape of the 
finished chart. 

3. Several Variables.—Two or more variables may be repre¬ 
sented in the same chart by representing each by bars of different 
color or cross-hatched in a different manner. Showing too many 
variables in the same chart makes the chart confusing. 

4. Subdivisions.—If the variable to be represented by bars 
may be divided into several parts, the bar corresponding to each 
value of the variable may be divided into parts whose lengths 


Gtain 


Table XXVI— Grain Crops of Four States 


State A 


Wheat.. 
Corn ... . 
Oats.... 

Totals 


Millions 

of 

bushels 


5 

10 

5 


20 


Percent 

of 

total 


25 

50 

25 


State B 


100 


Millions 

of 

bushels 


15 

9 

11 


35 


Percent 

of 

total 


43 

26 

31 


State C 


100 


Millions 

of 

bushels 


20 

3 

2 


25 


Percent 

of 

total 


80 

12 

8 


100 


State D 


Millions 

of 

bushels 


23 

12 

10 


45 


Per cent 
of 

totnl 


51 

27 

22 


100 


correspond to the values of the parts of the variable. For 
example, suppose the population of a city is 350,000, of whic 
290,000 are whites, 30,000 negroes, 20,000 Orientals, and 10,000 









GRAPHICAL REPRESENTATION , FREQUENCY GRAPHS 45 

Indians. The bar representing 350,000 would be divided into 
parts proportional to the numbers 290,000, 30,000, 20,000, 
and 10,000, each part being of a different color or cross-hatched 
differently. The parts of all bars should be arranged in the same 
order. Those parts at the beginning of the bars are easily com¬ 
pared. Those farther on are not so easily compared. A legend 
should be made in a convenient place (usually the lower right- 
hand part of the sheet), showing the meaning of each color or 



Wheai fTF^ Corn HI Oais YMA 


Fig. 3.—Grain crops of four states. 



Fig. 4.—Grain crops of four states. Percentages. 


cross-hatching. Figure 3 illustrates this variety of chart, the 
data being taken from Table XXVI. 

6. One-hundred Per Cent Bars.—Sometimes it is desired to 
portray each part as a per cent of the whole. Then each bar is 
a 100 per cent bar and all are the same length. The parts of each 
bar are proportional to the corresponding percentages. Figure 
4 is a percentage chart of the data shown in Fig. 3. The per¬ 
centages are tabulated in Table XXVI. 

6. Pie Diagrams.—Another form of graphic representation is 
known as the pie diagram. This may be used for variables that 
are divided into parts such as the various grain crops of differ- 








46 


A FIRST COURSE IN STATISTICAL METHOD 


ent states of Table XXVI. Using the data of this table for 
illustration, draw a circle for each state. The areas of these 
circles are made proportional to the total grain crops of the 
respective states. This requires the radii to be proportional to 
the square roots of the total crop for each of the respective 
states. The circumference of each circle is divided into segments 
proportional to the amounts of each crop in the corresponding 
state. Radii are drawn to the points of division. This makes 


6ra'in Crops o-f Four SFcrtes 





the amount of any crop proportional to the area of the corre¬ 
sponding circle sector. The results are show n in Fig. 5. 

For practical purposes of comparison of each crop in different 
states, this kind of diagram is useless. For example, the corn 
crops of A and D are difficult to compare without referring to 
the numbers entered on the circle sectors, or to the length of the 
bar segments of Fig. 3. The radius of the corn sector in A is less 
than that in D, while the arc is greater. This makes it almost 
impossible to compare the areas by eye. 



GRAPHICAL REPRESENTATION , FREQUENCY GRAPHS 47 

7. Percentage Pie Diagrams.—The case is not quite so bad if 
percentages are used. Each circle is then a 100 per cent circle 
and all have the same radius which is purely arbitrary in length. 

The result is shown in Fig. 6. To compare percentages of 
different crops in the same state, or of the same crop in different 
states, it is only necessary to compare lengths of arcs or size of 
angle at the center instead of different shaped areas. Lengths of 
straight lines are more easily compared by the eye than lengths 
of curved lines or sizes of angles. So for the purposes of study in 
the office, or for research, the pie diagram is nearly worthless. 





Fig. 6. 

8. Use of Pie Diagrams.—The pie diagram, however, is very 
popular for display purposes. It catches the public eye. It 
appeals to the popular mind, especially in representing financial 
data. An example is the division of the working-man’s dollar, 
showing what per cent goes for food, what for rent, etc. The 
dollar is round; so is the pie diagram. The dollar is divided 
into a hundred cents. The per cent pie diagram is divided into 
a hundred equal parts. Thus the division of the dollar is easily 
understood by means of this sort of diagram. If a number of 
charts are being displayed, it may rest the eye and hold attention 
if the form of chart is varied from time to time. The pie diagram 
may well serve this purpose. 



48 


A FIRST COURSE IN STATISTICAL METHOD 


9. Pictograms.—Sometimes resort is made to pictures for 
representing the various sizes of the variable, the size of 
the picture being proportional to the value represented. Such 
graphs may be called pictograms. As illustrative examples, may 
be mentioned size of armies by pictures of soldiers proportioned 
to the sizes of armies represented, or growth of the navy by pic¬ 
tures of battleships, or comparison of wheat crops of the countries 
of the world by varying sizes of sacks of wheat, or varying sizes 
of piles of sacks. Many examples could be mentioned. 



A B 

Fia. 7. 



B 

Fio. 8. 





10. Methods of Representation.—As an example represent the 
size of an army by a picture of a soldier. Suppose army A has 
500,000 men and army B has 1,000,000 men. Army B is twice 
as large as army A. (1) Shall the picture of soldier B be 
twice as tall as soldier A? (2) Or shall the picture of soldier B be 
twice as big (twice the area) as that of soldier A? (3) Or shall we 
assume soldier B to be twice as big (twice as heavy) as soldier A? 

If the first method of representing the sizes of the armies by 
the tallness of the soldier is used, the other dimensions should 
strictly remain constant. Then soldier A will look very fat or 
soldier B will look very thin, making the pictures grotesque, as 

shown in Fig. 7. 

To make the pictures look well, it will be necessary to increase 
the width of B’s picture in the same ratio as the height. The 



GRAPHICAL REPRESENTATION , FREQUENCY GRAPHS 49 


result is shown in Fig. 8, where B is twice as tall and twice as wide 
as A. But now, the pictures of A and B being the same shape 
with the dimensions of the picture of B twice the corresponding 
dimensions of that of A, the area of the picture of B is four times 
that of A. People accustomed to comparing areas will see at a 
glance that the size of the picture of B is about four times that of A. 
Many people would interpret the picture of B as being more than 
twice the size of that of A but not four times as large. A smaller 
percentage of people, accustomed to comparing volumes, would 
interpret soldier B as being not only twice as high and twice as 
wide as soldier A but also twice as thick. In that case B would 



A B 

Fiq. 9. 


A 

Fiq. 10. 






be judged to be eight (2 3 ) times as large as A. It is very evi¬ 
dent, then, that Fig. 8 would be deceptive to most people. Some 
few would say that B is about eight times as large as A. More 
would think that B is about four times as large as A. Still others 
would judge B to be about three times as large as A. In most 
cases, only by mental effort would the heights alone be compared 
and the true result be obtained. 

Perhaps the best method would be to make the area of the pic¬ 
ture of B twice that of A method (2). In this case the ratio of 
corresponding dimensions will be 1: V2- The result is shown in 
Fig. 9. Of course, there is now difficulty for those people that 
would consider three dimensions and thus compare size or weights 

of the two soldiers, and for those people that would compare 
heights only. 

If the sizes or weights of the two soldiers are compared, the 
corresponding dimensions of the pictures should be in ratio 1:^2 



50 


/I FIRST COURSE IN STATISTICAL METHOD 


—method (3). Figure 10 has been made on this supposition. 
Probably but few people looking at this picture would interpret 
soldier B as being twice as large as soldier A. Nevertheless, 
this is the true picture if the two soldiers shown are of the same 
shape. 

11. Use with Caution.—The conclusion may readily be drawn 
that the pictogram is not a desirable form of presentation of statis¬ 
tical data if accuracy of representation is desired. The picto¬ 
gram makes a very popular form of appeal and is almost sure to 
attract the attention. No matter which of the above methods 
of construction is used, the pictogram is almost sure to be decep¬ 
tive to certain classes of people. For advertising purposes, it 
should, therefore, be used only with great caution. A person 
observing pictograms should be able to decide which method of 
construction has been used and interpret the pictures accordingly. 

12. Simplest Form Best.—Since practically everyone can, by 
the eye, compare areas more easily than volumes, and lengths 
more easily than areas, those forms of graphical representation 
which require the comparison of lengths only are the simplest and 
best to use. It happens that they are also the easiest to 
construct. Others may sometimes be resorted to for display 
purposes, to attract attention, or when the spectacular is desired. 
The general public is rapidly becoming accustomed to graphical 
data that are to be compared. This fact is eliminating the 
necessity of the spectacular, for gaining attention. 

13. Functional Relationship.—In studying the changes of 
variables, it has been seen that the value of a variable changes 
along with the change in value of another variable. The 
historical or time variable changes with change of time. In a 
frequency distribution, the value of frequency varies with the 
size of the object measured. When two variables are so related 
that the value of one depends on the value of the other, the first 
variable is called a function of the second. The variable whose 
value depends on that of the other is also called the dependent 
variable , the other the independent variable. Thus in Table XX 
the amount of savings-banks deposits may be regarded^as a func¬ 
tion of the time or date. It would then be called the dependent 
variable depending on time, the independent variable The 
amount of deposits appears as a function of time, in th* ca , 
always increasing with increase of time. A very goodl exampJ of 
functional relationship is the distance passed over by a falli g 



GRAPHICAL REPRESENTATION, FREQUENCY GRAPHS 51 


body in a given length of time. In this case the functional 
relationship may be expressed by a mathematical equation, 

s = Yigt\ 

Where s is the number of units of space passed over in t units of 
time, and g is a constant depending on gravitational attraction, 
and the particular units used in measuring space and time. The 
dependent variable, space, s, is a function of the independent 
variable, time, t. The form of the function is Hgt 2 , and can at 
once be determined for any given value of t. 

14. Graph of a Function.—In the graphical representation of a 
function the values of the independent variable are usually repre¬ 
sented in the horizontal direction and the values of the function, 
vertically. This is illustrated in Fig. 2. The independent 
variable, time, is plotted on a horizontal scale, and dependent 
variable, savings-banks deposits, on a vertical scale. In Fig. 2 
the only values of the dependent variable shown are represented 
by vertical bars at 10-year time intervals, these being the values 
shown in Table XXV. 

16. Graphical Interpolation.—For any date between the 
tabulated dates of Table XXV, there existed values of savings- 
banks deposits, but it is not known what the actual values were. I n 
other words, for values of the independent variable between those 
values tabulated, there correspond values of the dependent 
variable. These intermediate values may be approximated by 
connecting the tops of the bars in Fig. 2 by straight lines. Then, 
at any intermediate date on the horizontal scale, go up to the 
straight line and read the value on the vertical scale at the left. 
This process of obtaining graphically approximate values of the 
dependent variable for values of the independent variable in 
between the tabulated values is called graphical interpolation. 
The method of connecting the points by straight lines, of course, 
assumes a uniform yearly change in savings-banks deposits 
during any given 10-year period. 

16. Line Diagram.—It is unnecessary actually to construct the 
vertical bars. Instead, a single point may be plotted for each 
date of the table. A point for any date should be directly above 
the horizontal scale point for that date at a distance equal to the 
amount of savings-banks deposits for that date, as measured on 
the vertical scale. When a point has been plotted for each date 
given in the table, then each point may be connected with the 
next by a straight line. Instead of connecting the points by 



52 


A FIRST COURSE IE STATISTICAL METHOD 


straight lines, a smooth curve may be drawn passing through all 
the points. Figure 11 shows the points for Table XXV plotted 
and connected by straight lines. Such a graph is called a line 
diagram , or peak-top diagram. The name “peak-top” arises 
from the fact that a line diagram for a variable which alternately 
increases and decreases has somewhat of a saw-tooth appearance 
with peaks and low points. One author 1 calls these graphs 
curves. 



Fig. 11.—Savings banks deposits in the United States. 

17. Smoothing.—The dotted line of Fig. 11 is a smooth curve 
passing through the plotted points. Interpolation may be made 
to the smooth curve or to the straight lines. Interpolation to the 
smooth curve does not imply a uniform variation between tabular 
dates. Neither does it imply a sudden change in variation at 
tabular dates as does the line diagram. If the nature of the 
changes in the dependent variable between tabular dates is not 
known, it certainly will not be justifiable to draw a smooth curve. 
In that case the straight lines merely serve as a guide to the eye 
in following from one plotted point to another and to give a 
general picture of the change from date to date. InterDolation 

1 Karl G. Karste.v, “Charts and Graphs.’’ 



GRAPHICAL REPRESENTATION, FREQUENCY GRAPHS 53 

would not be justified unless modified by the statement that the 
result is true if the variation from point to point is uniform. If 
reasonable assumptions can be made in regard to the nature of 
the variation of the dependent variable between tabular entries, 
a smooth curve may be drawn to conform to them. Judgment 
and experience are the best guides to follow in smoothing. A 
fairly good rule in many cases is to think of the points as plotted 
on a level field and mark the path of a bicycle ridden through the 
points. There are no sudden changes in direction of the curve. 
In other words, it is smooth at all points. The method of smooth¬ 
ing the graph of a frequency distribution is somewhat different. 

18. Plotting a Function.—In case of an equational relationship 
between the independent variable and its function, such as 

s = Vzgt 2 , 

the graph is a determinable, mathematical curve. A great 
enough number of points may be determined so that the curve 
may be plotted with a high degree of accuracy. Using $ = }$gt* 
and supposing the conditions to be such that g = 32, the equation 
becomes s = IQt 2 . Make a table of values of the function s 
(dependent variable) for a series of values of the independent 
variable t, Table XXVII. 


Table XXVII 


s = 

16/* 

l 

$ 

0 

0 

l 

16 

2 

64 

3 

144 

4 

256 

5 

400 


Choose proper horizontal and vertical scales and plot the 
values of the function for the series of values of t = 0, 1, 2, 3, 4, 
and 5, as shown in Fig. 12. Where the points are as far apart as 
a t 4, s = 256 and t — 5, s = 400, intermediate values of t 
may be chosen, s computed, and the resulting points plotted. 

,“ e 1 = s = 324 is such a point. At t = 1, s = 16, 

e direction of the curve is changing so rapidly that intermediate 
points are again useful. Points at t = y 2) s = 4, and at t = iy 2 , 


54 


A FIRST COURSE IN STATISTICAL METHOD 


s = 36 will be convenient to use. Having plotted a sufficient 
number of points, the curve may be sketched in or drawn by help 
of a French curve or a flexible ruler. Since s varies not uniformly 
but as the square of t, the points should not be connected by 
straight lines. The curve is called the graph of the function 16/ 2 . 
The process of constructing the graph is called 'plotting the func¬ 
tion. With this curve the value of the function s may be at once 
read off for any value of t. Graphical interpolation may be made 





anywhere within the limits of the figure and to a degree of accu¬ 
racy depending on the care taken in plotting the points, drawing 
the curve, and reading off the required amounts. 

19. Several Dependent Variables.—With line diagrams, either 
peak-top or curve, two or more variables dependent on the same 
independent variable may be plotted above the same horizontal 
scale. As an example one might use deposits in savings-banks, 



GRAPHICAL REPRESENTATION, FREQUENCY GRAPHS 55 

value of agricultural products, and number of college students, 
at different dates. Time would be the independent variable, 
common to all three dependent variables. The data, given in 
Table XXVIII, are graphed in Fig. 13. 


Table XXVIII.— Savings-banks Deposits, Value of Wheat Chop, and 
Number of College Students in United States 


Year 

Deposits, in hun¬ 
dreds of millions of 
dollars 

Value of wheat crop, 
in hundreds of mil¬ 
lions of dollars 

Number of college 
students, in 
thousands 

1880 

8.2 

3.9 

38.2 

1S90 

15.2 

3.2 

55.7 

1900 

24.5 

4.2 

9S.9 

1910 

40.7 

5.8 

163.0 

1920 

65.4 

14.9 

441.4 



3. Savings banks deposits, value of wheat crop and number of college 

students in the United States. 








56 


A FIRST COURSE IN STATISTICAL METHOD 


For number of college students a scale different from that for 
money values must be used. This scale has been placed at the 
right. Sometimes when different scales are necessary, they are 
placed side by side at the left of the chart. More than two or 
three scales would be confusing. It is not well to graph too many 
variables in one chart. They become difficult to follow. It is 
better to group them in two or more charts. The advantage in 
plotting several variables in one chart is that their variations can 
be easily compared. In Fig. 13 it is very noticeable that the 
number of college students varies in about the same manner 
as deposits in savings banks. This at once suggests correlation 
between these two variables. Study this graph and see what 
other conclusions can be drawn. 

20. Axes.—In connection with graphs of this sort, certain 
mathematical terms should be recalled from algebra. In Fig. 
12 the horizontal scale line passes through the 0 of the vertical 
scale, and the vertical scale line passes through the 0 of the hori¬ 
zontal scale. These lines through the 0 of both scales are called 
coordinate axes. The horizontal line through the 0 of the vertical 
scale is called the axis of abscissas. The vertical line through the 
0 of the horizontal scale is called the axis of ordinates. In Fig. 
12 the axis of abscissas may also be called the t-axis, and the axis 
of ordinates the s-axis, named f rom the variables represented. The 
axis of abscissas is the axis of the independent variable. The axis 
of ordinates is the axis of the dependent variable or the function 
axis. The point of intersection of the two axes, the common zero 
point of both scales, is called the origin of coordinates or simply 
the origin. In passing, it may remarked that not in all graphs 
are the scale lines drawn through the origin. 

21. Coordinates.—The two numbers which determine the 
position of a plotted point are called the coordinates of the point. 
The coordinates of a point are its distances from the axes, as 
measured by the horizontal and vertical scales used. The dis¬ 
tance from the vertical axis is called the abscissa of the point. 
The distance from the horizontal axis is called the ordinate of 
the point. Points above the axis of abscissas have positive ordi¬ 
nates. Points below the axis of abscissas have negative ordinates. 
Points to the right of the axis of ordinates have positive abscissas. 
Points to the left of the axis of ordinates have negative abscissas. 
The two coordinates of a point are written together in a paren¬ 
thesis with a comma between them, the abscissa first followed 



GRAPHICAL REPRESENTATION, FREQUENCY GRAPHS 57 


by the ordinate, thus: (3, 4). The point (3, 4) is three horizontal 
scale units to the right of the axis of ordinates and four vertical 
scale units above the axis of abscissas. The point (—3,4) is three 
horizontal scale units to the left of the axis of ordinates and four 
vertical scale units above the axis of abscissas. The points 
plotted in Fig. 12 as taken from Table XXVII were (0, 0), (1,16), 
(2, 64), (3, 144), (4, 256), and (5, 400). 

The preceding definitions are according to the usual conven¬ 
tions. The axes might have been assumed in other positions and 
positive and negative in other directions, giving another coordi¬ 
nate system. The usual conventions are implied unless otherwise 
stated. 

22. x and y Independent.—Designate two variables by the 
letters x and y, as is commonly done in algebra. Let x be the 
independent variable and y the dependent variable. The hori¬ 
zontal axis or axis of abscissas is the x-axis. The vertical axis or 
axis of ordinates, the function axis, is the y-axis. The point 
(x, y) is any point in the plane of the axes, with abscissa equal to x 
and ordinate equal to y. If the values of neither x nor y are 
in any way dependent on the value of the other, the point (x, y) 
may wander all over the plane from one place to another by 
merely assuming arbitrary values for a; and for y. 

23. Straight Line through the Origin.—But suppose that x and 

y are subject to the condition that their ratio, y/x, must remain 
constant. For example, suppose y/x = 2. Then y = 2x. One 
is limited to consideration of that set of points such that the 
ordinate of each is just twice its abscissa. For x = 0, y = 0: 
* = 2/ = 2; x = 2, t/ = 4; x = —1^£, y = — 3, etc. For any 

change in x the change in y is just twice as much. In other words 
y changes uniformly with a uniform change in x. It follows that 
all the points whose coordinates satisfy the condition y = 2x 
must lie on a straight line through the point (0, 0), the origin. 
Or, consider the matter in this manner. In Fig. 14 take any two 
points, Px and P 2 , whose coordinates satisfy the condition 
V = 2x. Draw the abscissas OA lf OA 2 and the ordinates AtP u 
A 2 P 2 . By the given conditions 

AiP i __ A 2 P 2 __ 

OAx OA 2 ~ 

Therefore, from the similar triangles OAxPx and OA 2 P 2) 
the points 0, P h and P 2 must lie in a straight line. Since this 
conclusion holds true for any two points Pi and P 2 of the set of 



58 


A FIRST COURSE IN STATISTICAL METHOD 


points satisfying the condition, it holds for all points of the set. 
So all points, whose coordinates satisfy the condition y = 2x, 
must lie on a straight line through (0, 0). Moreover, the coor¬ 
dinates of any point off that straight line will not satisfy the given 
condition. The ratio of ordinate to abscissa will be either more or 
less than 2 according to the side of the line on which the point 
falls 



Fiq. 14. 

For P A) 

< 2 ' 

OA 4 

24. Locus, Slope.—The straight line P*Pz which contains all of 
the points and only those points whose coordinates satisfy the 
condition y = 2x is called the locus of the equation y = 2x. It is 
the graph of the function 2 x. y = 2 x is called the equation of 
the line P 2 P 3 , the coordinates of every point of which, and only 
those points, satisfy the equation. The ratio y/x = 2 is called 


GRAPHICAL REPRESENTATION, FREQUENCY GRAPHS 59 

the slope of the line. The slope may be obtained by taking any 
two points on the line, such as Pi and Pi, and finding the ratio of 
the difference of the ordinates of these two points to the difference 
of their abscissas. 

A tPt-A iPi _CP± = CP 1= , 

1)A 2 - OAi ~ AiA 2 PiC Sl0pe ‘ 

The same arguments apply to the condition y/x = m, any con¬ 
stant. So the locus of the general equation of this form y — mx 
must be a straight line through the origin (0, 0) with slope m. 

25. Straight Line Not through the Origin. Slope Form.— 
Take the point B (0, 3) on the y-axis. Through B draw a line 
parallel to PJP 3 . It is evident, at once, that the ordinate of any 
point Pi on this line is three units greater than the ordinate of the 
point on PzP 3 having the same abscissa. It must be true, there¬ 
fore, for every point on this line, and only those points, that 

y = 2x + 3. 

The distance 3 from the origin where this line cuts the y- 
axis is called the y-intercept. The x-inlercept may be found by 
putting 0 for y in the equation and solving for x. 

0 = 2x + 3, 

2x = — 3, 

* = -X = “1M- 

The point ( — 0) is on the x-axis and on the line. Why? 

The line Pi A is the locus of the equation y = 2x -f 3, and the 
graph of the function 

2x + 3. 

y = 2x + 3 is the equation of the line PfA. 

The same line of reasoning shows that if the line is moved 
parallel to itself until it cuts the y-axis at a distance b from 
the origin, its equation will become 

y = mx + b. ( 1 ) 

This is called the slope form of the equation of a straight line. 

The function mx + b, where m and b are constant and x 
variable, is called a linear function of x, because its graph is a 
straight line. 

26. First-degree Equation.—Any first-degree equation in x and 
y can be reduced to the form 

y = mx + b 



60 


A FIRST COURSE IN STATISTICAL METHOD 


simply by solving for y. Take the general form of a first-degre 
equation in x and y, 

Ax + By + C = 0. 

Solving for y gives 


( 2 ) 


V B X B' 


This shows that the locus of the equation 

Ax + By + C = 0 

is a straight line with slope equal to -A/B, and ^-intercept 
equal to ~C/B. 

If B = 0 the equation takes the form 

Ax + C = 0, 

which is a straight line parallel to the y- axis. Its slope is 
infinite and its ^-intercept is infinite. 

For example, 

2x - Zy - 6 = 0 
is the equation of a straight line with slope 


and ^-intercept 


2 2 
m = - -3 = 3’ 


b =-® = -2. 


-3 


This line is the line I in Fig. 15. Its equation may be written 

y = %x - 2. 

Line I is the graph of the function 

%x-2. 

27. Intercept Form.—Solve the equation 


for y, giving 


? + ! = i 

a b 


y = x + b. 

CL 


This is a straight line with slope m = — b/a, and y-intercept b. 
Putting y = 0 in the equation and solving for x gives the x- 
intercept equal to a. 


x + l = i 

a b 


( 3 ) 


GRAPHICAL REPRESENTATION, FREQUENCY GRAPHS 61 


is called the intercept form of the equation of a straight line. The 
equation of line I (Fig. 15) is at once put into this form by trans¬ 
position and division by 6, giving 


x 

3 



showing the x-intercept to be 3 and the ^-intercept to be 
- 2 . 

If the x-intercept is 3 and the y-intercept 2, as for line II 
(Fig. 15), the equation of the line is at once written, 

* + v. = i 

3 ' 2 


y 



Solving for y gives 

+ 2 . 

r l herefore, 

m = 

6 = 2 . 

28, Increasing and Decreasing Linear Functions—Notice that 
the slope, of line II is negative. This means that the line 
slopes down to the right. In other words the function 


decreases with increase of x. 



62 


A FIRST COURSE IN STATISTICAL METHOD 


For line I the slope % is positive, showing that the line slopes 
up to the right. Therefore, the function 

%x - 2 

increases with increase of x. A positive slope gives an increasing 
function. A negative slope gives a decreasing function. 

29. Straight Line through Two Points—Suppose the coordi¬ 
nates of two points are given and it is desired to write the equation 
of the straight line through these points. Let the points be Pi 
(1, 2) and P 2 (3, 4) of Fig. 16. Let P, with coordinates x and y, 
be any point on the required line. The line through P x and P 



Fio. 16.—Straight line through two points. 


must be coincident with the line through Pi and P 2 . Therefore, 
the slope of PiP must equal the slope of P 1 P 2 . By inspection of 
the figure, and by the rule given on page 59, the slope of P 1 P is 

CP y- 2 t 

PiC X - 1 

and the slope of P 1 P 2 is 

C 2 P 2 _ 4 - 2 
P 1 C 2 3-l’ 

Since these two slopes are equal, we write 

y - 2 4-2 
x - 1 3-1* 



GRAPHICAL REPRESENTATION, FREQUENCY GRAPHS 63 


Solving for y gives 

y = x + 1. 

In general, if Pi is the point (x lt iji) and P 2 is (x 2) 7 / 2 ), the equa¬ 
tion of the line through Pi and P 2 is 

V - 2/i _ 2/2 - yi 

* ■ ■ ■ — — — - • 

X — Xi X 2 — Xj 

This can be readily reduced to any one of the standard forms 
previously given. 

In a later chapter there will be occasion to use the equation of 
a straight line in getting the straight line of closest fit to a given 
set of observed data. 

30. Circle.—The distance of the point Pi (3, 4) from the origin 
in Fig. 17 is \/3 2 + 4 2 = 5. The distance r, of any point P(x, y) 



from the origin, is V* 2 + y\ If the point P may take only 

those positions such that its distance from the origin is 5 the 
equation 

V^+~y 2 = 5 

holds true for all such points and no others. The set of points 
u ling the condition that each is at a distance 5 from the origin 
forms a circle with radius 5 and center at the origin. So 

V»*"+ y 2 = 5 


64 


A FIRST COURSE IN STATISTICAL METHOD 


is the equation of a circle with radius 5 and center at the origin. 
The equation is usually written in the form 

x 2 + = 25. 

If the radius is r, the equation is 

x 2 + y 2 = r 2 . 

If the center is not at the origin, the equation is easily derived. 
It is 


(* “ °Y + (y - b) 2 = r 2 , 



for which the center is at the point (a, b). By expanding, the 
equation becomes 

x 2 + y 2 ~ 2 ax - 2 by + a 2 + b 2 - r 2 =0. 

Notice that it has first degree terms in x and y as well as second- 
degree terms. 

31. Parabola.—Examine the equation 

y = x 2 



GRAPHICAL REPRESENTATION, FREQUENCY GRAPHS 65 

to see what may be discovered in regard to its locus. When 
x » 0, y = 0, the locus must pass through the origin. When x = 
1, y = 1, when x = 2, y = 4, when x = 3, y = 9, etc. As x 
increases, y increases more rapidly than does x. In fact, it 
increases as the square of x. The larger is x } the faster 
y increases. Negative values of x give the same values of y as do 
the numerically equal positive values of x. There are no values 
of x that will give negative values of y. Why? 

It is seen that the locus must go through the origin, extending 
upward both to the right and to the left, getting steeper as it 
goes up. The right-hand part is symmetrical to the left-hand part 
with respect to the y-axis. There is no part of the curve below 
the z-axis. This curve is called a parabola (see Fig. 18). 

The locus of the equation 

y = ax 2 

will also be a parabola. A point on this curve with a given abscissa 
will have an ordinate a times the ordinate of that point on the 
locus of 

y = x 2 

that has the same abscissa. 

The line of symmetry, in this case the y-axis, is called the axis 
of the parabola. Where the parabola intersects its axis, in this 
case the origin, is called the vertex of the parabola. 

32. Second-degree Function of x — If the vertex is not at the 
origin, the form of the equation changes. It can be shown that if 
the vertex is at the point ( h , k), and with the axis parallel to the 
y-axis of coordinates, the equation changes from 

y = ax 2 
to 

(y - k) = a(x - h)\ 

Expanding, transposing, and solving for y gives 

y = ax 2 - 2 ahx -f ah 2 -f k. 

This function of x has a term in x 2 , a term in x and constant 

terms, a second-degree function. The equation may then be 
written in the form 

y — ax 2 + bx + c, 

the general equation of a parabola with its axis parallel to the 
y-axis. 

33. Use of Parabola.—Many sets of observed data which can¬ 
not be fitted by a straight line can be quite closely fitted by a 



66 


A FIRST COURSE IN STATISTICAL METHOD 


parabola. It may be recalled that the formula giving the dis¬ 
tance s through which a body falls under the influence of gravita¬ 
tion in time t is 

« = M.gP- 

This function plotted gives a parabola, since it is a second- 
degree function of t (see Fig. 12). What would be the meaningfor 
negative values of t ? 

34. Uniform and Non-uniform Scales.—So far, all the plotting 
of values of variables has been done with a uniform scale. By 
this is meant that for equal differences in the variable there are 
equal segments of the scale line. On the uniform or arithmetic 
scale the distance from x = 1 to x = 2 equals the distance from 
z = 3 to x = 4, and so on indefinitely. Many times a non- 
uniform scale is used in order to simplify the plotting of the graph 
and to facilitate the study of the nature of the function. 

The idea may easily be illustrated by means of the parabola 
equation 

y = Vl z 2 - 

In this equation put z for x 2 and the equation takes the form 

V = 34 *- 

This is linear in z. Tabulate values of z and y for a series of 
values of x. 


Table XXIX 


Z = X 2 , y = y 2 z 

X 

z 

V 

0 

0 

0 

l 

l 

M 

2 

4 

2 

3 

9 

4M 

4 

16 

8 


35. Scale of Squares—In Fig. 19 on the z-axis are located 
points at distances from the origin equal to the values of z tabu¬ 
lated in Table XXIX. These points are marked with the corre¬ 
sponding values of x shown in the table. Vertical lines have been 




GRAPHICAL REPRESENTATION, FREQUENCY GRAPHS 67 

drawn at these points for convenience in plotting. The x-scalc 
thus derived is not a uniform scale. The distance from x = 0 
to x = 1 is less than from x = 1 to x = 2, which, in turn, is less 
than from x = 2 to x = 3. If this scale is continued, the scale 
distances for equal x-differences will continue to increase as x 
increases. This may be called a scale of squares . The actual 
distance from the origin of any point on the x-axis is the square of 
the number with which that point is marked. 

Now plot, on the x- and y- scales, the pairs of values of x and y 
as tabulated, paying no attention to the values of z. With these 
scales available for plotting, it is superfluous to tabulate values 
of z. The points are marked with small circles. These points 
appear to be in a straight line. In fact, since the parabolic func- 

y 



tion y 2 x* is a square function, these points must fall in a straight 

line, being plotted to a scale of squares. This line, moreover, is 

the locus of the equation y = H* and must be straight since 
is a linear function. 

36. Inverse Process.—Suppose the tabulated values of z and y 
had been the result of observed measurements. Then if the 
pairs of values, plotted on paper ruled horizontally in a scale of 
squares and vertically in a uniform scale, determine a set of 
points m a straight line, it is known that, so far as the observed 
values are concerned, y is a parabolic function of x. If this line 
goes through the origin, it is known the equation of the parabola 
nttmg the observed values is of the form 

y = ax 2 . 



68 


A FIRST COURSE IN STATISTICAL METHOD 


The slope of the line is a. Why? To determine a, take two 
points on the line and divide the difference of their ordinates by 
the difference of their abscissas, both measured in the same units 
of measurement. For instance, the difference of the ordinates of 
P 2 and P 4 is 6 . The difference of their abscissas is 12. The 
slope of the line is then Y \2 = = a. This fixes the equation 

of the parabola as 

y = Yzx 2 . 

For negative values of x, will the straight line of Fig. 19 con¬ 
tinue straight on through the origin or will it change its direction? 
What is the significance of the result? 

37 . Establishing a Law.—If the values of x and y in Table 
XXIX are the result of the observed measurements, there are 
too few of them to warrant assuming that a law has been estab¬ 
lished for all similar measurements. If, however, a large number 
of observations be made, or many sets of observations taken, 
and always the resultant graph falls very nearly on the line 
P 1 P 4 of Fig. 19, the probability is very great that the law as 
stated by the equation 

V = Vix 2 

is established. It would then be quite safe to predict the value of 
y for any assumed value of x within the limits of the observations. 

38 . Example.—For example, suppose that a large number of 
observations of s, the space passed through by a falling body in 
time t, be tabulated and plotted on uniform scales for both s 
and t. If the resulting points seem to be nearly on a curve that 
looks like a parabola, they can then be plotted on a uniform s- 
scale and a scale of squares for t. If it is found these points fall 
on a straight line, our hypothesis of the parabola is confirmed. 
At any rate, by getting the straight line of closest fit to these 
points, the parabola of closest fit to the original observations may 
be determined. 

39 . Logarithmic Scale.—Another veiy useful non-uniform 
scale is the logarithmic scale . 1 A function which a large class of 
observed data follows closely is ka x , giving the equation 

y = ka x . 

In which k and a are constants. Taking logarithms of both 
sides of the equation gives 

log y = x log a + log k. 

1 A brief discussion of logarithms may be found in Appendix A. 



GRAPHICAL REPRESENTATION, FREQUENCY GRAPHS G9 


Put 

z for the variable log y, 
m for the constant log a, 
b for the constant log k. 

and the logarithmic equation takes the form 

z = mx + b. 

This is a linear equation and with uniform scales for x and z 
will plot as a straight line. If, then, the uniform scale for z be 
replaced by a logarithmic scale for y, the original equation will 
plot as a straight line. 

40. Example.—Take, for example, 


Then 

Put 

Therefore, 

and 


y = 2 X . 

log y = x log 2. 

2 = log y, log 2 = 0.301. 
2 = 0.301 x, 


y = antilog z. 

Tabulate values of z and y for a series of values of x. 


Table XXX 



Values of 2 were found and tabulated by multiplying the 
corresponding value of x by 0.301, the logarithm of 2. Each 
value of y was obtained from a table of logarithms by looking up 

t °l WhlCh the corres P° n d in S value of 2 is the logarithm, 

that is, by finding antilog 2 . 



70 


-4 FIRST COURSE IX STATISTICAL METHOD 


Of course, the values of y may be computed directly from the 
equation 

y = 2 X 

by finding powers of 2 corresponding to each value of x. 

In Fig. 20 the x-scale and the 2 -scale are uniform. The 
equation, 

2 = 0.301 x 

being linear, will plot to these scales as a straight line. A non- 
uniform y -scale is constructed so that the distances from the 
origin marked with values of y are equal to the logarithms of these 



values as measured on the uniform 2 -scale. This t/-scale is a 
logarithmic scale. Suppose that the values of z have not been 
introduced, but the values of y have been computed directly from 

the equation 

V = 2 *, 

and tabulated with the values of x. If the pairs of values of x 
and y be plotted on the uniform x-scale and the logarithmic y- 

scale, the straight line of Fig. 20 is obtained at once. 

41. Exponential—If the values of x and y are both plotted to 
uniform scales, the exponential curve of Fig. 21 is obtained. For 
negative values of x, this curve gets closer and closer to the 


GRAPHICAL REPRESENTATION, FREQUENCY GRAPHS 71 


i-axis but never reaches it. If a set of observed measurements 
when plotted seems to follow this sort of curve, it may be tested 
by plotting on a uniform horizontal scale with a logarithmic 
vertical scale. If the points fall on a straight line, the observed 
data follow the exponential law. 

Paper ruled with uniform scale in one direction and logarithmic 
scale in the other direction is now easily obtained in the market. 
It is almost indispensable in studying and comparing rates of 



f™ 7 P^tage. of change. The subject will be taken up 
in more detail in a later chapter. 

Gra P hs -— A very important form of graphic 
epresentation rs that of the frequency distribution. The 

graph is called a histogram. Do not confuse this term with 

2rthati e t7 d " hi " t0ri&ram ” a PP' ies only to historical 

The fo™ S ! tlmC iS the “dependent variable, 

the form and the construction of the histogram are best shown 

by an actual example. Take the distributing of leaf len^hs" 



72 


A FIRST COURSE IN STATISTICAL METHOD 


given in Table XXIII, plotting size of item on the horizontal 
scale and frequency on the vertical scale. 

43. Rectangular Histogram.—The rectangular histogram is 
quite similar to the vertical-bar diagram. Suitable scales must 
be chosen so that the graph of the data may be made to fit the 
paper, be of sufficient size to be read easily, and be of good pro¬ 
portions. The width of the rectangles is equal to the class inter¬ 
val. The left-hand side of the first rectangle is plotted at the 
lower boundary of the first class, bringing sides of the rectangles 
at class boundaries throughout the table. The right-hand side 



Length in Millimeters 

Fig. 22.—Rectangular histogram of sixty leaf lengths. 


of the last rectangle must then come at the upper boundary o 
the last class. The height of each rectangle is made equal to the 
frequency of the class it represents. The finished rectangular 

histogram is shown in Fig. 22. 

44. What Is Shown—It can now be seen at a glance 

The manner in which frequencies increase and decrease. 

2. The sizes at which the frequencies tend to concentrate. 

3. The class of greatest frequency. 










































GRAPHICAL REPRESENTATION , FREQUENCY GRAPHS 73 


4. Whether there is more tendency to long or to short leaves. 
It is noticeable that the frequencies increase up to about the 
middle class and then decrease to the end. The maximum fre¬ 
quency occurs in the middle class. 

45. Meaning of Area.—If the class interval is taken as the unit 
of horizontal measurement and unit frequency as the unit of 
vertical measurement, the area of the rectangular histogram up 
to the upper boundary of any class is then the total frequency up 
to and including that class. 

46. Frequency Polygon.—Another form of histogram is the 
frequency polygon. This is shown in Fig. 23. Scales are chosen 



Flo. 23.—Frequency polygon of sixty leaf lengths. 


as was done for the rectangular histogram. The values of the 
frequencies are plotted as points at the midpoint of their respective 
classes. These points are connected by straight lines from each 
to the next. In the tabulated sample of 60 leaf lengths, the fre¬ 
quency of class 15-19 is 0. So the polygon is drawn to the base 
hne at 17, the midpoint of this class. Every class smaller than 
this has a frequency of 0; so the polygon stops at 17 on the base 





74 


A FIRST COURSE IN STATISTICAL METHOD 


line. Similarly, for the other end of the table, the polygon 
stops at 57. 

47. Area.—The area of the polygon is exactly the same as that 
of the rectangles. I his may be readily seen if the polygon is 
drawn by connecting the midpoints of the tops of the rectangles 
and extending it to the midpoints of the next class above and 
below those represented by the rectangles. It follows that, if 
units are chosen as was done for the rectangles, the area of the 
polygon up to the midpoint of any class equals the total frequency 
up to and including half of that class. 

The fact that the polygon extends beyond the limits of the 
table suggests that, if the lengths of all the leaves on the tree, 
from which the sample of GO leaves was taken, had been measured, 
a few would have been found shorter than any in the sample and 
also a few longer. It is likely that this was true. 

The frequency polygon is a line or peak-top diagram. It is 
probably easier to see the tendencies in variation of frequency in 
the polygon than in the rectangular histogram. 

48. Distribution Shown by Sample.—Both the polygon and 
the rectangles show the variation of frequencies for the 60 leaf 
lengths actually measured. The presence or absence of irregu¬ 
larities of distribution in the sample is shown. If one is interested 
in these GO only, the representation is complete. But the pur¬ 
pose of the investigation would usually be to ascertain the man¬ 
ner in which frequencies are distributed over various sizes for all 
the leaves on the tree or perhaps for all the leaves of the particular 
variety used. For this purpose, the sample used must be a 
random sample^ so chosen as to be representative of all leaf 
lengths of the entire set it is desired to study. The entire set 
which the sample is supposed to represent is called the universe or 
population from which the sample is taken. A sample of 60 leaf 
lengths is of too small a number to be reliable as a random sample. 

49. Frequency Curve.—Assume, however, that this sample of 
60 is fairly representative for the population from which it was 
chosen, say all the leaves on the tree, a large number. Then to 
get a graph for the entire population, a smooth curve is drawn, 
based on either the rectangular histogram or the frequency 
polygon. This smooth curve is called a frequency curve. In 
drawing this curve, bear in mind that its purpose is to bring out 
general tendencies of the entire universe represented by the 
sample. This curve should meet the base line at or near the same 



GRAPHICAL REPRESENTATION, FREQUENCY GRAPHS 75 

points where the frequency polygon meets it. It goes up to the 
right, relatively slow at first, and then gets steeper to a certain 
point where it becomes less steep until it reaches a high point and 
turns down. It now gets steeper until a point is reached where it 
flattens out again to meet the base line. This distribution sug¬ 
gests the symmetric binomial distribution of Chap. II. If leaves x 
mm. shorter than the average are equally likely to occur with 
leaves x mm. longer than the average, and if medium-length 
leaves are most likely to occur, while the shortest leaves and the 
longest leaves are few in number, it is justifiable to assume that 
the symmetric binomial distribution would fit fairly well. The 
theoretical frequencies could then be computed from the coeffi¬ 
cients of the expansion of (y 2 + J^) 6 , and the curve plotted. 
From the symmetry of the coefficients, the curve would be a 
symmetric, bell-shaped curve. 

Now, it is not known that leaf length should be distributed 
according to the above-mentioned law. In fact, it seems reason¬ 
able that in some varieties of leaves there would be more of a 
tendency to long leaves or, in others, short leaves, than is shown 
by this sample. Most frequency distributions are of such a 
nature that some one of a set of type forms of curves can be found 
to fit the distribution with a fair degree of closeness. The study 
of these type curves requires more advanced mathematics. 

60. Drawing the Smooth Curve.—It will be necessary, here, to 
draw the curve so that, exercising the best judgment, it represents 
the distribution of frequencies in the entire population from which 
the sample was selected. If it is known that the data are the 
result of natural phenomena or of pure chance, it is justifiable to 
smooth out irregularities, assuming that they are accidents of the 
sample. This matter could be tested out by collecting other 
samples and examining them to see whether the same irregulari¬ 
ties occur at the same size of item. The larger the sample, if 
properly selected, the more likely it is to be representative of the 
entire population. If cumulative errors are present, increasing 
the number of items in the sample is of no use. In the case of 
economic or sociological data, irregularities are quite likely to 
occur that it would not be justifiable to smooth out. 

In most cases the high point of the curve will occur in the same 
class as the high point of the polygon or the highest rectangle. 
If, as in this case the next lower class has greater frequency than 
the next higher class, the high point of the curve may be placed 



76 


A FIRST COURSE IN STATISTICAL METHOD 


below (to the left of) the middle of the class of greatest frequency. 
The histogram readily shows the relative weights of these 
frequencies next to the class of greatest frequency. 

Sometimes the curve should go higher than the highest point of 
the polygon or the highest rectangle, sometimes not so high. 
From the heights of the third, fourth, and fifth rectangles of Fig. 
22, it would seem likely that the sample chosen has too many 
items in the third class, not enough in the fourth class, and per¬ 
haps too many in the fifth class, to be truly representative of all 
the leaf lengths of the entire tree. If that is so, the curve should 
lop off, to some extent, the heights for the third and fifth classes, 
and go above that for the fourth class as shown. In some cases 
the high point of the frequency polygon will tower above the 
adjacent points plotted, making it evident that this high frequency 
is an accident of the sample. If so, the smooth curve should cut 
off this high peak, bringing out what is likely to be the general 
tendency. 

61. Area of Curve.—The area under the curve should be as 
nearly as possible the same as that of the rectangles or the 
polygon. This makes the parts cut off from the rectangles or the 
polygons equal to the parts filled in outside them. This is similar 
to the problem of making a railroad grade. The dirt taken from 
the cuts should even up the fills, and dirt should not be hauled 
farther than necessary. It should be unnecessary to carry dirt 
over the top of the hill to the other side. Any number of curves 
having the correct area could be drawn. Necessarily only one of 
them is the correct one, giving the true distribution of the 
universe from which the sample was taken. 

62. Histograms from Pure Chance.—Figure 24 shows a 
frequency polygon of the number of times each possible number 
of heads occurred in 500 random throws of seven dimes as 
tabulated in Table XV. 

Strictly, only the points plotted are used, since there could not 
be a fractional number of heads. Number of heads forms a 
discrete series. The points are connected, however, by straight 
lines in order to guide the eye in studying the frequency 
distribution. 

The theoretical frequencies, shown in Table XVII, column /, 
are plotted in Fig. 24 and a smooth curve drawn through the 
points. This curve is very close to what one would have obtained 
by smoothing in the same manner as was done, for leaf lengths in 



GRAPHICAL REPRESENTATION, FREQUENCY GRAPHS 77 


Fig. 23. Comparison of the polygon and the smooth curve shows 
what adjustments are to be made in the actual observations to 
get the theoretical results due to the laws of probability. 


s 10 

§9 

o 8 

O 

§ 6 

vr 

£ 5 


■ ■■■■■■■■■■■■■■■■■■■■a ■■■■■■■■■■ 

■ >■■■■■■ ■•■■■■* !■■■■■■<■■■■■■■■■■■■■■■■■■■»■—WMMWf 

• •••■■■a■■■■•■■■■■■■ m-i •■■■■■•■ iaa•>■ >■■■■■■ ■■•■■■•■•■ § 


uimaaMiiiiMMirMiiutauMaMuimamiimmiraaiiHiiNltvi iiaaMBiraiiBaiBaiiaa 
a ■■■■■■■■• $ aiaiBiBaaB aaiaBMiaa Miaatiiaa ■■■•■■■■aa ■■■•*■■■■ caBaaaavaaaa 
laaaaaaiaaaiaiHiaN ■■■■•■••aiiMaaMaaaiMiaaaaiaMaaaaiaMaHMMaa 

laaaaaiaiBBBBaBaaMiaBiiBBBkBaaaiaBaBBBBaaaaaiBBBaBBBBBaBBBBBlMIMBMitafKBBBBaBBBBaBBBBt 
laaaaMaaaaaaaaaa 4'.aaBB8«BBBBBBBBBBBBaBBBBMai8Baa aBBBaaaiBBBBMNBBHVMBBBBHa ■■■■■■■■■■ 

laaiaaaaiaBaaiaa'jaaaBaaBBaaaaBaNaaiaaaBaiiNamaaBaaaaaBaaaBkaBiaiiataBiflMBBiaiiaaaaaa 

■aaBBaaaaaaaaaaraiBaiBa8aBflBaBBBaBBBflBflaBBflBBBBBBBBBBBBiBaBBBmBBBBBBBBiBaBMaBBiaaai|BBBB 

Maaaii>aBagaBBj».BBBaBawBiaaaaBBaaaBaBBBBBBBBBaBBBBBBBBB ; B B i g | i ftiili>»^M 

• ■aaaaaaBaaaaavaaaaaaaBaBaaaaaaaaaaaa8aaaaBBaaiaBBBiBBBBBaa8|MBaaBBBaaBB\^BBBBaBBiBBaaaii 

• ■■■••■■■•■■a ^lIBBBIBBIfllBlliaUBMMIBBBBIBBaaBaBBfefliiBiSftiAftBiiaMliBailiVIMBBBBBIBBIB 

iBBBBBBBBrtfiBaBBBBflBaBBBBBBBBIBBBSBBBBBBBaBBBBBBBBBiaaiBBBiifekiftiBBBaBBaBBBBaBk •aBBBBBBBBB 
——.■■BBBBBBBBBBBBBBBBBBB BBBBBBBB BB BBBlBBBBBBBBBBBBlUBBIMMBtBBBBBWMfc^MBiBf | 

^^^^^^^^^^^^^^HBIIIBIIIIIfllllliniflllBIIBIIlBIIBBIMliai««. 


JBBBBBBIBBflBBBBBBBBBIBBBBBBBIBBBI 


01134-S678 

Number of Heads 

Fio. 24.—Frequency polygon of number of heads in 500 throws of 7 dimes 


53. Symmetric Binomial Histograms.—The theoretical fre¬ 
quencies are the terms in the binomial expansion of 

(M + l AY 

each multiplied by the total number of throws (see Chap. II, pp. 
31 to 33). If there were eight dimes, the theoretical frequen¬ 
cies would be the terms of 

(M + MY 

each multiplied by the total number of throws. In general, if 
there are n dimes, the theoretical frequencies are the terms of 
the expansion of 

(X + MY 

each multiplied by the total number of throws. In every case 
the sum of the frequencies, that is, the sum of the ordinates of 
the points plotted, equals the total number of throw's of the coins. 
This sum, then, may be kept constant. If the same horizontal 
scale is kept as n increases, the width of the polygon, its spread, 
increases proportionately. It is conceivable that some function 




78 


A FIRST COURSE IN STATISTICAL METHOD 


of this spread may be devised such that it may be used as a unit 
of measurement on the horizontal scale each way from the max¬ 
imum ordinate. Notice that the maximum ordinate stands at 
the midpoint of the base line, when p = q = y 2 . Several such 
functions of the spread have been used. The one most used is 
designated by the Greek letter small sigma, <r. 

c = V npq. 

A fuller discussion of the derivation and meaning of this function 
will be given in the chapter on dispersion. Each frequency 
plotted corresponds to a term of the expansion of i }/ 2 + y 2 ) n . 
Let x equal the number of terms from the middle of the series, 
negative if to the left and positive if to the right of the middle. 



Fig. 25 .—Symmetric binomial frequency graphs. 


Using the measure a, the abscissas of points plotted will be x/a. 
Then, if each ordinate is multipled by a, the areas under succes¬ 
sive polygons, as n increases, will be the same. 

Beginning with (}•£ + J£) 2 , (n = 2), and increasing n, several 
polygons are constructed in Fig. 25 on the basis of 500 throws. 
The computations for coordinates of the points are tabulated as 
follows: 

._ \/2 1 

n = 2, a — v npq = —= 0.707, - = 1.414. 


(I + 5)’ = Jid + 2 + 1) = 0.25 + 0.50 + 0.25. 



GRAPHICAL REPRESENTATION, FREQUENCY GRAPHS 79 


n = 2 


X 

1 

X 

a 

l 

Per cent y is 
of 500 

V 

<ry 

-2 

— 2. S 

0 

0 

0 

-1 

-1.4 

25 

125 j 

88 

0 

0 

50 

250 1 

176 

1 

1.4 

25 

125 

88 

2 

2.8 

0 

0 

0 

i 



100 

500 




(l + ^) 4 = ^(1 + 4 + G + 4 + 1) = 0.0025 + 0.2500 

+ 0.3750 + 0.2500 + 0.0625. 


n = 4 


X 

X 

c r 

Per cent y is 
of 500 

y 

°y 

-3 

-3 

0.00 

0 

0 

-2 

-2 

6.25 

31 

31 

-1 

-1 

25.00 

125 

125 

0 

0 

37.50 

188 

188 

1 

1 

25.00 

125 

125 

2 

2 

6.25 

31 

31 

3 

3 

0.00 

0 

0 



100.00 

500 



" = 10, «■ = ^10 X \ X \ = ^ = 1.581, l = 0.632. 

Z 1 . 1\ 10 1 , 

\2 + 2/ = poV + 10 + 45 + 120 + 210 + 252 + 210 


+ 120 + 45 + 10 + 1) 

- 0.000977+ 0.009766 + 0.043945 + 0.117187 + 0.205078 

+ 0.246094 + 0.205078 + 0.117187 + 0.043945 + 0.009766 
+ 0.000977. 












80 


A FIRST COURSE IN STATISTICAL METHOD 


n = 10 


X 

X 

0 

Per cent y is 
of 500 

y 

«y 

-6 

-3.795 


0 

0 

-5 

-3.162 


G.5 

1 

-4 

-2.530 


5 

8 

-3 

-1.897 

4.39 

22 

35 

-2 

-1.265 

11.72 

59 

93 

-1 

-0.632 


102 

161 

0 

0 

24.61 

123 

194 

1 

0.632 

20.51 

102 

161 

2 

1.265 

11.72 

59 

93 

3 

1.897 

4.39 

22 

35 

4 

2.530 

0.98 

5 

8 

5 

3.162 

0.10 

0.5 

1 

6 

3.795 

0.00 

0 

0 



100.02 

500 



n = 50 ’ *-Jox\x 2 = ^ = 3 - 535 ' \ = ^T = 

0.2828 

/I 1\ 50 1 

(2 + 2 ) = 2» (1 + 50 + h225 + 19,600 + 230,300 + 

2,118,760 + 15,890,700 
+99,884,400 + 536,878,650 + 2,505,433,700 + 10,272,278,170 
+ 37,353,738,800 + 121,399,651,100 + 354,860,518,600 + 
937,845,656,300 + 2,250,829,575,120 + 4,923,689,695,575 + 
9,847,379,391,150 + 18,053,528,883,775 + 30,405,943,383,200 
+ 47,129,212,243,960 + 67,327,446,062,800 + 88,749,815,264,- 
600 + 108,043,253,365,600 + 121,548,660,036,300 + 126,410,- 
606,437,752 + each of the numbers preceding this last one, but 
coming in opposite order). 

The last term written above is g 0 C 2 6, the middle term of the 
expansion. 



GRAPHICAL REPRESENTATION, FREQUENCY GRAPHS 81 


n = 50 


X 

X 

<J 

Per cent y is 
of 500 

y 

*V 

-14 

-3.96 

0.003 

Less than 0.1 


-13 

-3.68 

0.01 

Less than 0.1 


-12 

-3.39 

0.03 

0.15 

0.5 

-11 

-3.11 

0.08 

0.4 1 

1.4 

-10 

-2.83 

0.20 

1 

4 

- 9 

-2.55 

0.44 

2 

7 

- 8 

-2.26 

0.88 

4 

14 

- 7 

-1.98 

1.60 

8 

28 

- 6 

-1.70 

2.70 

13.5 

48 

- 5 

-1.41 

4.18 

21 

74 

- 4 

-1.13 

6.00 

30 

106 

- 3 

-0.85 

7.89 

39 

138 

- 2 

-0.57 

9.64 

48 

170 

- 1 

-0.28 

10.80 

54 

191 

0 

0.00 

11.25 

56 

197 


Positive values of x repeat all values tabulated in reverse order. 
The table completed would give 53 values of x from x = —26 to 
x = 26. For x = 26 and x — —26, y = 0. For values of x 
between x = 12 and x = 26, the values of y are too small to plot 
unless a very large scale be used. 

For the values of x/<r and ay are as follows com¬ 

puted from Table XVII, p. 33, Chap. II. 

n = 7 < 17 = yj 7 x l x \ = = 1-323, 1 = 0.756. 








82 


A FIRST COURSE IN STATISTICAL METHOD 


54. Normal Probability Curve.—From the above computation 
for (A + A) 50 , it is seen that n does not need to be very large to 
make the labor of computation very great. From the polygons 
plotted in Fig. 25, it is seen that as n increases from 2 to 50 the 
sides of the polygons become shorter and seem to be approaching 
closer and closer to a definite curve. It is true that the polygon 
determined by 

(X + AY 

does approach a definite curve as n increases without limit. This 
is known as the normal probability curve and its equation has been 
found to be 1 

_ X * 

y = yoe 

The y-axis of coordinates passes through the high point of the 
curve; so the origin of coordinates is at the same place as in Fig. 25. 
The ordinate at x = 0 is y 0) the maximum ordinate of the curve; 
e is the base of natural logarithms; a is the constant used above. 
With 7i = 50 the polygon is so near the curve that it was deemed 
best, in order to avoid confusion, not to draw the curve in this 
figure. It evidently will be slightly above the polygon for 
n = 50, in the middle portion from about x/a = — 1 to x/a = 
+ 1. In fact, from the equation of this curve, it can be proved 
by calculus that the direction of curvature changes at the points 
where x/a = — 1 and x/a = +1. To the left of —1 and to the 
right of +1 the curve will be slightly below the polygon for a 
considerable distance. The polygon for n = 50 does not quite 
reach the horizontal axis until a: = —26 and x = +26, where 
y = 0. The curve never quite reaches the horizontal axis but 
gets closer and closer to it as x/a increases. In tables for statis¬ 
ticians 2 the values of the ordinate y are tabulated for values of 
x/a. At x/a = 4, y is only 0.013 per cent of the total frequency. 

The normal probability curve is also known as the normal 
frequency curve, the normal curve of error, the Gaussian curve, 
the Gauss-Laplace curve. Since the distributions of many 
frequencies follow this approximately, its properties are of great 

1 G. Udney Yule, “Theory of Statistics,” p. 301, does not use calculus. 
Truman L. Kelley, “Statistical Method,” p. 94, gives a simple derivation 
using calculus. 

2 “Tables for Statisticians and Biometricians,” Karl Pearson. ‘ Tables 
of Applied Mathematics,” James W. Glover. 



GRAPHICAL REPRESENTATION , FREQUENCY GRAPHS 83 


importance in the statistical treatment of observations. It will 
be used later in this book. 

55. Rectangular Histogram of Weights of Freshmen.—Now 
construct a rectangular histogram of the weights of freshmen as 
shown in the frequency table, Table XII (Chap. II). Choose 
proper scales to make the figure well proportioned. Remember 
to place scale marks on the heavy lines of the ruling. Round 
numbers are chosen such that subdivisions of the scale are easily 
read. Figure 2G shows the result. It will be noticed that the 



histogram tails off to low frequencies in the heavy-weights. If 
this tail, as it is called, were cut off, the remaining part of the 
histogram would be fairly symmetrical and not far from a sym¬ 
metric binomial distribution. There are, however, irregularities. 

Idd « \ t0p ° f the fourth were cut off and 

is Ikclvtw tb ’ distribution would be much smoother. It 

Dlethfi *b S IrregU arit y 13 mere 'y aa irregularity of the sam- 

seent, the n bl S T P ! T*'* 1 DOt show ' A similar situation is 
en in the neighborhood of weights from 160 to 170 lb. There 




84 


A FIRST COURSE IN STATISTICAL METHOD 


seem to be two high points. Again it is a question whether 
another sample would show the same thing. Perhaps the two 
high points should be cut off and distributed to the two lower ones 
between them. It is a matter which must be decided if a smooth 
curve is to be drawn. 

56. Frequency Polygon of Weights of Freshmen—Using the 
same data, Fig. 27 shows the frequency polygon. The irregulari¬ 
ties above mentioned now stand out more conspicuously than in 
the rectangular histogram. If now a smooth curve were made, it 
would almost surely be drawn so as to smooth out all the irregu- 



Weigh* in Pounds 

Fig. 27. —Frequency polygon of weights of freshman men. 

interval.) 


(Small class 


larities unless there is question in regard to whether there should 
be two high points or not. In connection with the frequency 
tables, it was noted that increasing the class interval wipes out 
irregularities of the sample and brings out general tendencies. 

57. Effect of Wider Class Interval.—Proceed next to graph 
Table XIII. Figure 28 shows the rectangular histogram, and 
Fig. 29 the frequency polygon. The smoothing out of the 
irregularities due to increasing the class interval is very marked. 
It is now a very simple matter to draw a free-hand smooth curve 
fitting the distribution quite well. The characteristic bell- 



GRAPHICAL REPRESENTATION , FREQUENCY GRAPHS 85 


no 

IOC 

150 

140 

130 

120 

*no 

y 100 


S' 80 




m ii 


'in 


Ml} 

J 




If 


if 


'll 
am bwl 


u 

IlllUmr 


yo IUU HO 120 130 140 150 100 HO 180 190 

Weigh! in Pounds 

Fio. 28 .—Rectangular histogram of weights of freshman men. 

interval.) 


200 210 220 


(Large class 


^100 

c 90 

% 80 
<r 

M0 


to 


/ 


u 


k\ 


\ 






m 


mSa 


II 


200 210 


mis 


W0 150 IfeO 170 
Weigh! in Pounds 

F.o. 29-Frequency polygon of weights of freshman men. (Large class interval.) 









































































































































































































































































































































































































































































































86 


A FIRST COURSE IN STATISTICAL METHOD 


shaped carve is obtained though not quite symmetrical. The 
tail to the right is considerably longer than to the left. 

58. Frequency Graphs of Freshman Heights.—The heights of 
freshmen as given in Table XIV will now be graphed. The 
rectangular histogram is shown in Fig. 30 and the frequency 
polygon in Fig. 31. Slight tendencies to hump up are shown in 
the polygon at heights 63.5 and 72.5 which are hardly noticeable 



Fig. 30.—Rectangular histogram of heights of freshman men. 



Fig. 31.—Frequency polygon of heights of freshman men. 

in the rectangles. It is reasonable to assume that the polygon is 
too high at height of 69.5, too low at 71, too high at 72.5, and too 
low again at 74. These and similar considerations on the other 
half of the polygon assist in smoothing. 

69. Comparison with Symmetric Binomial Curve.—If 10 coins 
were tossed 629 times, the symmetric binomial frequency curve 
that would be expected by virtue of the laws of probability, 




GRAPHICAL REPRESENTATION, FREQUENCY GRAPHS 87 


showing the frequency of heads, is shown by the dotted curve in 
Fig. 31. This originates, of course, from the terms of the expan¬ 
sion of 

(W + JO 10 . 

The closeness of this curve to the free-hand, smooth curve 
already drawn is very noticeable. The heights of these G29 
freshmen then gives an excellent illustration of the following of 
a symmetric point binomial frequency distribution by a variable 
which is the result of natural phenomena. 

60. Ogive.—Another form of frequency graph called the ogive 
is of considerable importance for some purposes. It is simply a 


600 


:n 

o 

c 

3 

cr 

L. 


> 

P 
o 
5 
E 

3 150 
100 
50 
0 

Weight in Pounds 
Fia. 32.—Ogive of weights of freshraan men. 




graph of cumulative frequencies. In the frequency polygon the 
frequencies were plotted at the midpoints of the classes This 
may be done on the assumption that the midpoint of a class is 

to be the 8 f S1Z \° f the ItemS 0CCUrring in that class - This seems 

within the 7 “ ptl0n t0 make if th e actual distribution 
with m the class is unknown. In the case of the ogive the cumu- 

lative frequencies are plotted at the upper boundaries of the 

at ant ° ' ThiS “ done because the cumulative frequency 

at any class is the total frequency up to and including that cltss 

In other words, it is the total frequency up to the uppfr be tnZy 









88 


A FIRST COURSE IN STATISTICAL METHOD 


of the class. The graph starts on the base line at the lower 
boundary of the first class. 

ligure 32 is the ogive of freshman weights plotted from the 
cumulative frequency column of Table XIII. Figure 33 is that 
for freshman heights. The angular ogive never goes down to 
the right, since there are no negative frequencies. It would 
remain horizontal at classes with zero frequency. Otherwise it is 
always going up to the right. 

Difference of ordinates between any two points gives the 
increase in number of items between the sizes of item measured 
by the abscissas of the two points. So the steeper the graph the 



more rapid the increase of total frequency. With a symmetric 
bell-shaped distribution, the ogive goes up slowly at first, most 
rapidly at the middle, and slowly again at the end, being symmet¬ 
rical with respect to the middle point. This is very nearly true 
in Fig. 33. In Fig. 32 the flat part at the upper end is much 
longer than the flat part at the lower end. This corresponds to 
the long tail to the right in the frequency polygon (Fig. 29). 

61. Comparison of Ogive with Histogram.—The ogive is more 
easily smoothed than the frequency polygon. The curve does 
not usually have to depart very far from the points plotted from 
the cumulative frequency column. 



GRAPHICAL REPRESENTATION, FREQUENCY GRAPHS 89 


In the frequency polygon, the area up to the ordinate of any 
point (the class interval being the unit of horizontal measure) 
equals the total frequency up to that point. The area under the 
smooth histogram up to any ordinate, proper units of measure 
being used, gives the total frequency up to that ordinate. In 
the ogive the ordinate at any point gives the total frequency up 
to that point. At any given value of size of item the ordinate to 
the ogive equals the area under the smooth histogram up to the 
same point. Such a curve as the ogive is called the integral curve 
of the smooth histogram. 

Certain division points are easily determined graphically from 
the ogive. This will be taken up in a later chapter. 

62. Statistical Maps.—A form of graph used a great deal is 
the statistical map. Outline maps are printed for this purpose 
or such a map may be made. The use of the statistical map is 
to show geographic distribution. The data may be shown by 
various colors, by various forms of shading or cross-hatching, 
or by dots. A wide variety of data may be shown in a variety of 
ways. Only one form will be discussed here, the dot map. 
Suppose it is desired to show graphically the number of auto¬ 
mobiles in the states of the United States. Of course, the data 
are first tabulated by states. By examining the data a scale 
may be chosen. The scale is one dot for a certain number of 
automobiles, perhaps one dot for every 10,000. Then an outline 
map of the United States, showing outlines of the states, is taken 
and in each state is placed a number of dots equal to the number of 
10,000 automobiles in that state. If the dots are arranged at 
the same distance apart everywhere, the numbers of automobiles 
m any two different states are readily compared by the size of 
the groups of dots in those states. If the dots are scattered uni¬ 
formly all over the area of each state, then the density of automo- 
bfies in different states is readily compared. The farther apart 
the dots, in this case, the fewer automobiles per square mile. 

This example gives some idea of what may be done with the 

statistical map. The subject is very well treated in books on 
graphic charts. 


63. General Rules. A few general rules may be given for 
making charts of various kinds. 

1. Choose the scales carefully. They should be chosen so as 

to cover all the data and make a well-proportioned figure on the 
paper used. 


90 


A FIR $T COURSE IN ST A TI ST I CAL METHOD 

2. Scales should be such as to bring out the fluctuations of the 
function without unduly magnifying them. 

3. The scales should be large enough to enable one to read the 
graph to the required degree of accuracy. 

4. The scale lines should be ruled as heavy lines. 

5. The scale marks should be placed on the heavy lines of 
the ruled paper. 

6. The scale marks should be integers or round numbers so 
chosen that the subdivisions of the ruled paper are easily read. 

7. In comparing two or more variables, the units of measure 
should be such as to bring out the fluctuations of each. For 
example, if the number of pounds of flour and the number of 
pounds of pepper used by the average family are plotted on such 
a scale as to bring out the fluctuations in flour used, the fluctua¬ 
tions in pepper will scarcely show. If ounces of pepper and 
hundredweights of flour are plotted, it is much more likely that 
the fluctuations in both will be brought out so as to be compared. 

8. Neatness and accuracy are prime requisites. 

Exercises 

1. On millimeter paper plot a horizontal bar chart of the following data of 
passenger automobiles exported to different countries in 1921. 


Leading Foreign Markets for American Passenger Cars 1 


Country 

Number 

1921 

1922 

Australia. 

3,020 

11,236 

Canada. 

5,243 

10,214 

Mexico. 

6,750 

7,279 

Belgium. 

533 

4,785 

United Kingdom. 

888 

4,315 

Sweden. 

920 

3,063 

Argentina. 

613 

2,497 

British South Africa. 

596 

2,327 

Spain. 

421 

2,117 

Other countries. 

11,966 

18,957 

Total. 

30,950 

66,790 


1 From "Commerce Yearbook” for 1922. 
















GRAPHICAL REPRESENTATION , FREQUENCY GRAPHS 91 


2. Plot the same data as a vertical-bar chart. 

3. Plot a horizontal-bar chart showing the data for both 1021 and 1922. 
Plot the data for 1921 in black and those for 1922 in red. Place the bars for 
the two years for each country close together, leaving a wide space between 
the two bars for one country and the two bars for the next country. 

By a glance at the chart tell which country gave the largest per cent of 
increase for 1922 over 1921. 

By a glance of the eye at the chart tell whether exports to any country 
were less in 1922 than 1921. 

Which country took the smallest number of cars in 1921? That country 
took about how many times as many cars in 1922? 

Which country took the smallest number of cars in 1922? That was about 
how many times as many as that country took in 1921? 

4. Make a vertical-bar chart showing the number of passenger cars 
exported from the United States for each year from 1913 to 1922 inclusive. 


United States Exports of Passenger Cars 1 


Year 

Number 

1913 

25,880 

1914 

22,335 

1915 

41,864 

1916 

61,922 

1917 

65,756 

191S 

36,936 

1919 

67,145 

1920 

142.50S 

1921 

30.950 

1922 

66,790 

1 From ‘ Commerce Yearbook” for 


In this chart does the increase of number of automobiles exported in 1922 

over that of 1921 look as impressive as it did in the chart of Ex. 3? Why? 

5 Make a horizontal-bar chart of the following freight data. Make 

one bar for each year, subdivided so as to show the amount of freight 

originating in each of the three districts. Make the subdivisions for each 

district different either by three colors or three forms of cross-hatching. 

Make a egend in a convenient place indicating the meaning of each color 
or cross-hatching. 

Complete the table of data by forming a column of totals for the United 


92 


A FIRST COURSE IN STATISTICAL METHOD 


Revenue Freight Originating on Class I Roads, by District* 

(In short tons) 


Calendar year 

Districts 

Eastern 

Southern 

Western 

1916 

588,934,108 

196,392,311 

416,673,648 

1917 

617,844,385 

211,474,636 

434,696,704 

1918 

614,703,873 

216,081,819 

432,558,301 

1919 

523,810,514 

194,564,103 

377,736,654 

1920 

606,786,167 

224,127,173 

424,507,651 

1921 

449,674,156 

177,258,034 

313,250,370 

1922 

451,585,046 

208,582,180 

362,942,352 

1 From "Commerce Yearbook" for 1922, 


If a scale is used such that only three-figure accuracy can be plotted, plot 
the data to the nearest million tons. 

For which district is it easiest to compare the changes from year to year? 

For which district is this comparison most difficult? Why is this? 

6 . Make a table showing the percentage each entry of the table in Ex. 5 
is of the total for the corresponding year. 

Make a 100 per cent bar chart of the results, one bar for each year. 

For which year did the eastern division originate the smallest percentage 
of freight? W as its total tonnage that year also less than for any other year? 

For which year did the southern division originate the greatest percentage 
of freight? Was its total tonnage that year also greater than for any other 
year? 

7. Make a pie diagram of car loadings for each quarter of 1922. Make 
the areas of the circles proportional to the total number of car loadings for 
each quarter. 


Number of Car Loadings, Class I Roads, by Principal Commodities, 

for 1922 1 


Quarter 

H 

Live 

stock 

Coal 

Forest 

products 

Merchan¬ 

dise 

First. 

606,208 

379,587 

2,379,387 

650,817 

2,848,223 

Second. 

499,622 

372,899 

1,064,595 

779,355 

3,155,667 

Third. 

685,797 

396,315 

1,494,101 

735,605 

2,985,878 

Fourth. 

675,731 

489,122 

2,510,258 

773,269 

2,888,044 


1 From “Commerce Yearbook" for 1922 





GRAPHICAL REPRESENTATION, FREQUENCY GRAPHS 93 


Is there much apparent difference in the sizes of the circles? The total 
for the third quarter was about one-fourth bigger than that for the second 
quarter. Do the corresponding circles seem to have about that proportional 
size? 

8 . Reduce the number of car loadings of each commodity to percentage 
of the total number of car loadings for the corresponding quarter. Make 
a 100 per cent pie diagram for each quarter. These circles should all have 
the same area. 

The number of car loadings of forest products for the second and fourth 
quarters were about the same. Do the diagrams reveal for which of these 
two quarters the number of car loadings of forest products was the greater 
percentage of the total car loadings for the corresponding quarter? 

9. Make a line chart of the revenue freight data shown in Ex. 5. Use a 
different type of line for each district. Make a legend, in a convenient place, 
showing what each line represents, or label each line in the chart to show 
which district it represents. 

Do the increases and decreases from year to year seem to run about the 
same for all districts? 

Is there any time during the period of seven years when the number of 
tons for the southern district is greater than that for each of the other 
districts? 

10. Plot a line chart of the following population of the state of Washing¬ 
ton. 


Population of Washington 


Year 

Population 

1860 

11,594 

1870 

23,955 

1880 

75,116 

1890 

357,232 

1900 

518,103 

1910 

1,141,990 

1920 

1,356,621 


Connect each plotted point to the next by a straight line. After this is 
done, draw a smooth curve through the plotted points. 

Does the graph show whether the increases from decade to decade are 
themselves increasing? How? 

If the population increased the same number each year from 1890 to 1900 
how can the population in 1897 be determined graphically? 

11. Choose a pair of coordinate axes and plot the following functions: 

(a) (d)y = -| x , 

(e) y = - ~2 + 2, (/) y = - ^ - 2, (y) y = — (/i) y = |c + 2. 


94 


A FIRST COURSE IN STATISTICAL METHOD 


What are the relative positions of the graphs of (a), ( b ), and (c) ? How could 
this situation have been seen without plotting the lines? 

What seems to be the position of lines ( d ) ( e ) and (/) with respect to ( g )? 

12. Where do the loci of x/3 + y /4 = 1 and x /3 — y/ 4 = 1 cut the 
coordinate axes? 

13. Find the equation of the straight line through the points (1, 2) and 
(2, —1). Also find the equation of the straight line parallel to this line 
and passing through the point (2, 3). Where does this latter line cut the 
y- axis? 

14. How may the locus of a: 2 + y 2 = 16 be drawn without plotting points? 

16. Plot the locus of y = x 2 /2. What is this curve called? Where is its 

vertex? 

16. Plot the parabola y = -x 2 /2 - 2x + 2. Where is its vertex? 
Which way does the curve run from the vertex? 

17. Make a non-uniform x-scale so that y = x 3 /2 shall plot as a straight 
line. 

18. Procure a sheet of semilogarithmic paper and plot on it the population 
of the state of Washington as given in Ex. 10. 

Connect each plotted point to the next by a straight line. 

Did the population increase at a more rapid rate during the first decade 
or during the last decade of the record? What shows this on the graph? 

For how long after 1860 did the population increase at an increasing rate? 
How is this shown? 

During which decade was the rate of increase least? What shows this? 
By graphic interpolation determine the population in 1895. How would 

this result be computed from the tabulated data? 

19. Make a rectangular histogram of mean monthly temperatures at 
Seattle, using the frequency table of Ex. 2, Chap. II. Superimpose a 
frequency polygon on the rectangular histogram. Draw a smooth fre¬ 
quency curve. 

20. Make a rectangular histogram of monthly precipitation at Seattle, 
using the frequency table of Ex. 3 , Chap. II. 

Construct a frequency polygon, separately, for the same data. Draw a 
smooth curve for each figure without comparing either with the other. 

After the two curves are finished, compare the results. 

21. Construct a rectangular histogram of frequencies for all possi e 

number of heads as shown in Table XV. 

22. Construct an ogive for the data in each of the three preceding exercises. 

23. Make a frequency polygon of the temperature data as shown in tie 
frequency table of Ex. 6, Chap. II. What characteristic difference is there 
between this graph and the polygon of Ex. 19? What is the reason or 

this difference? p . rr 

24. Construct a frequency polygon from the table of Ex. 10, y ia P- ' 
On the same set of axes plot a polygon of the theoretical frequencies ot ia- 

11 Chap. II. If 10,000 tosses of the 10 coins had been made and the re 

ing actual frequencies and theoretical frequencies graphed as aho\e, 
the two polygons be likely to be more nearly alike than the two jus plotted. 
26. Make a horizontal-bar chart of the following data of imports. 


GRAPHICAL REPRESENTATION, FREQUENCY GRAPHS 95 


United States Imports of Merchandise (Specie Values) 


Fiscal year 

Free 

Dutiable 

1913 

$987,521,1 G2 

$ 825,484,072 

1914 

1,127,502.699 

706,422,958 

1915 

1,033,52G,675 

640,643,005 

1916 

1,492,647,350 

705,236,160 

1917 

1,848,840,520 

810,514,645 

1918 

2,118,599,372 

827,056,031 

1919 

2,230,222,808 

865,497,260 


3,405,233,003 

1.833,119,111 

1921 

2,137,440,504 

1,517,018,842 

1922 

1,598,888,618 

1,009,190,390 


26. Make a vertical-bar chart of the data of Ex. 25. 

27. Make a line chart of the data of Ex. 25. 

28. Compute and tabulate for each year the per cent free imports and the 
per cent dutiable imports arc of the total imports in Ex. 25. Make a per 
cent chart of the results. 

29. Do the two classes of imports increase and decrease together? Which 
of the charts in Exs. 25 to 28 best shows tins? Does the per cent of free 
imports remain fairly constant nr not? What shows this graphically? 




CHAPTER IV 


AVERAGES 

Statistics has been called the science of averages. An average 
is a type form of a group of items used to represent the entire 
group. In a sense it is a representative sample consisting of one 
value only. It is not necessary, however, that it actually have 
the value of one of the items of the group it represents. It 
would seem that its size should be somewhere about midway 
in the entire group of sizes. 

1. General Definition.—In general, some items of the group 
are larger than the average and some are smaller. The average 
will satisfy the condition of being such a function of the entire 
group of values that if all of the group happen to be equal to 
each other, then the average equals each one of the group. 1 

2. For Comparison.—Averages make it possible to compare 
different groups. This was suggested in Chap. I in comparing 
heights of trees in different forests. Sometimes comparison of 
averages might be meaningless. This was seen in Chap. I, 
where it was desired to compare the amount of lumber in two 
forests by comparing their average tree heights. 

3. Arithmetic Mean.—When asked to find the average of a 
set of numbers, the kind of average that probably comes to mind 
is the arithmetic mean. The arithmetic mean of a set of values, 
X h Xu X 3) . . . X N , is their sum divided by the number of 

items in the set. 

X\ -h X 2 4“ X 3 + . ._•_“b Xn _ ^X 

a = pj V ' 

In this formula X, is the size of the first item in the set, X, 
the size of the second item, and so on to X N the size of the 
item iV is the number of items. 2X is the sum of all such items 
as X h Xu and so on. This is read “sum of such tenns as X or 
“sigma X” The arithmetic mean of the N items is a. 

1 Prof Irving Fisher, “The Making of Index Numbers.” 

96 


AVERAGES 


97 


The arithmetic mean satisfies the general definition of average 
given on page 96. For if X\ = X 2 = X 3 = . • • — Xu — k, t 
the arithmetic mean, 

2X Nk , 


That is, a equals each one of the set. 

4. Important Property of a. —If the totality of all values of 
the items 2X is known and the number of items N, it is not neces¬ 
sary to know the individual items to get the arithmetic mean. 


2X 



Or, if a and N are known, the totality of all values is determined. 

2X = aN. 

If a man travels 30 miles in 10 hr., his average speed is 3 9^o = 3 
miles per hour. To find the average it is not necessary to know 
his speed at any one time. If his speed was less than 3 miles per 
hour at one time, it had to be greater than 3 miles per hour at 
some other time. He may even have stopped to rest part of the 
time, when his speed was nil. Or he may have gone backwards 
part of the time, when his speed was negative. 

5. Deviations from a. —A valuable property of the arithmetic 
mean is that the algebraic sum of the demotions of all the items 
from the arithmetic mean is zero . 

By demotion of an item from the arithmetic mean is meant the 
remainder obtained by subtracting the arithmetic mean from that 
item. Letting d stand for deviation: 

d\ = Xi — a, d 2 -X -2 — a, etc. 

To prove the sum of the deviations equal to zero, write them 
down and add. 

= Xi — a, 
d'> = X 2 — a, 
dz = X 3 — a, 


Adding, 


du — Xu — a. 

2d = 2X - Na. 


98 


A FIRST COURSE IN STATISTICAL METHOD 


Since 



which was to be proved. 

6. Deviations from Any Arbitrary Value .—The average devia¬ 
tion of all the items from any arbitrary number added to that arbi¬ 
trary number equals the arithmetic mean of the items. 

Letting the arbitrary number be X', the above theorem stated 
in symbols is 

2(X - X') 
a = X +- x - 


The proof is as follows: 

Y , , S(X - X') 
X + N 


X + N 


= X' + 


IX 
' N 


NX' 

N 

X' 


2 X 

N 


= a. 


Or, more in detail, 2(X — X'); the sum of the deviations from 
X', is obtained by direct addition 

Xi -X' 

X 2 - X' 

X 3 - X' 



Adding, -X - NX'. 

Therefore, 

2(X - X') « 2X - NX', 


and the above proof follows at once. .. 

7. Short-cut.— This property of the arithmetic mean makes 

possible to take deviations from a convenient number an 
these compute the mean. This process is known «* the 
method of obtaining arithmetic mean. This may be illustrated 


AVERAGES 


99 


by an example. Find the arithmetic mean of the numbers in 
the following column: 

86,745 
86,749 
86,758 

86.761 

86.762 

5 )433,775 
86,755 = a. 

middle number of the set, 

d x ’ 

- 13 

- 9 
0 
3 

_4 

2 d x . = - 15 

~ d f = 86,758 + ~ 

= 86,755. 

This saved the bother of adding the set of five-figure numbers. 

Someone may say that since the first three figures are the same for 

all these numbers, it is only necessary to average the numbers 

given by the last two figures, and annex the result to the first 

three figures. This is only the short-cut method, using 86,700 
for X'. 

This method may not seem to be a very short-cut, but, as 
applied in frequency tables and correlation tables, it will be 
found to be truly a short-cut. 

8. Arithmetic Mean from Frequency Table.—Finding the arith¬ 
metic mean from a frequency table may be illustrated by use of 
Table XIII of freshman weights. The work is tabulated in 
Table XXXI. The usual assumption to make in regard to the 
distribution of the weights in any class is that the weights vary 
uniformly from item to item in the class. Under this assumption 
the average for any class would be the midpoint of the class, 
ihe sum of all the items in the class would be this midpoint, or 
average, multiplied by the frequency for that class, fX. The 


Or, letting X' equal the 

x 

86,745 

86,749 

86,758 

86.761 

86.762 

N = 5 
a = X' + 





100 


A FIRST COURSE IN STATISTICAL METHOD 


size of item A" in Table XXXI, is tabulated as the midpoints of 
the classes. 


Table XXXI.— Arithmetic Mean of Weights of Freshman Men in 

Pounds 


Size of item 

A' 

Frequency 

/ 

fx 

105 

15 

1,575 

116 

43 

4,988 

127 

138 

17,526 

138 

162 

22,356 

149 

129 

19,221 

160 

82 

13,120 

171 

35 

5,985 

182 

16 

2,912 

193 

5 

965 

204 

3 

612 

215 

1 

215 

Total. 

N = 629 

89,475 


a 


2A 

N 


89,475 

629 


142.25. 


9. Short-cut.—The work for the short-cut method is shown in 
Table XXXII. Let A' be chosen near the middle of the table so 
that the deviations d x ., from X' will be as small as possible. 
Choose X' as one of the tabulated values of X so that d X ; will be 
0 for that class. Also, if X' is one of the tabulated values of X 
the deviations d x . will be multiples of the class interval, + for 
values of A greater than A', and - for values of A less than A . 
It is also desirable that A' differ but little from a, so that 

shall be small. 





AVERAGES 


101 


Table XXXII.— Arithmetic Mean of Weights of Freshman Men 

Short-cut Method 


Size of item, in pounds 

X 

Frequency 

/ 

d.y 

f<lx‘ 

105 

15 

-55 

- 825 

116 

43 

-44 

- 1,892 

127 

138 

-33 

- 4,554 

138 

162 

-22 

- 3,564 

149 

129 

-11 

- 1,419 

160 

S2 

0 

0 

171 

35 

11 

385 

182 

16 

22 

352 

193 

5 

33 

165 

204 

3 

44 

132 

215 

1 

55 

55 

Total. 

N = 629 


-11,165 



a = X' + 



„ . — 11,165 

a = 160 + g 2 g ’ 

a = 160 - 17.75, 


a = 142.25. 


10. Warning.—Do not make the mistake of adding the d x > 
column. This sum would give the sum of deviations for one item 
only from each class, or 11 items in all, instead of C29. There 
are 15 items with an average deviation of —55, making — 55 X 
15 == —825 for the sum of the deviations of these 15 items. 
Similarly, the sum of the deviations for the 43 items of the 
second class is -44 X 43 = -1,892, and so on down through 
the table. The sum of the deviations for the whole 629, that is, 
for the entire table and not for the d x - column alone, is 

2dx ' = "" X ’) = -12,254 + 1,089 = -11,165. 

This last sum may be written 


±ja x . = 


u,iu o 


if the context would thus be made clearer. 

11. Another Example.-Table XXXIII is for finding the 

fromTable'xiV. heightS ° f freshmen ' The data are obtained 










102 


A FIRST COURSE IN STATISTICAL METHOD 


Table XXXIII.—Arithmetic Mean of Heights of Freshman Men 


Size of item, in inches 

X 

Frequency 

f 

dx' 

fdx’ 

60.5 

3 

-7.5 

- 22.5 

62.0 

8 

-6.0 

- 48.0 

63.5 

33 

-4.5 

-148.5 

65.0 

51 

-3.0 

-153.0 

66.5 

115 

-1.5 

-172.5 

6S.0 

156 

0 

0 

69.5 

148 

1.5 

222.0 

71.0 

64 

3.0 

192.0 

72.5 

43 

4.5 

193.5 

74.0 

5 

6.0 

30.0 

75.5 

3 

7.5 

22.5 

Total. 



115.5 



a = C8.0 + ”| g - = 68.0+0.18, 
a = G8.18. 


12. If a is Not One of the Given Items.—The arithmetic mean 
is frequently not one of the sizes of items in the universe repre¬ 
sented. It may even be an impossible number. For example, 
the average size of family may be given as 5.63 persons. Of 
course, there could not be a fractional number of persons in a 
family. This need not destroy its value as an average. 

13. Not Always Representative.—The arithmetic mean may 
not always be truly representative of the entire group of items. 
For example, suppose the following gives the height of trees in a 

dooryard. 


Height of Trees in a Dooryard 


Height, in feet 

X 


Number 


/ 




750 


75 ft. 


a 


10 








AVERAGES 


103 


If the statement is made that the 10 trees in this dooryard aver¬ 
age 75 ft. high, someone seeing them might easily think that the 
average height was overstated. Seventy-five feet is not a truly 
representative height, as it could not be used as a single number 
describing the entire group of trees. Some other kind of average 
would doubtless be better. 

14. Mode.—Another average is suggested by the remark: 
“ Mr. A is a man of average height.” 

It is quite likely that the speaker means that Mr. A is of that 
height which occurs most frequently, the height, one may say, 
which is the style, la mode. Such an average is called the mode. 
The mode, then, may be defined as that size which occurs 7nost 
frequently. If the items tend to concentrate about the mode, they 
are said to be stable or true to type. 

The mode satisfies the general definition of an average, for if 



the value which occurs most frequently is k, equal to each of the 


set. 


A stricter definition may be given from consideration of a 
frequency distribution. A mode is a value whose frequency is 
greater than the frequency of the smaller or of the larger sizes in the 
immediate neighborhood. By this definition there might be more 
than one mode in a given set of values of a variable. An example 
may be seen in Table XII of freshman weights. The corre¬ 
sponding frequency polygon (Fig. 27) shows two quite distinct 
high points at weights 130.5 and 142.5 lb. These two modal 
weights may be modes for this sample of 629 freshmen only. It 
may be there is really only one modal weight for all freshmen. 
Some variables, however, do have more than one proper mode. 

15. Mode Shown by Frequency Graph—If there is a decided 
mode, a frequency table or frequency graph nearly always shows 
near what size of item it lies. Table XIII of weights of freshmen 
indicates a mode in class 133-143 lb. The smooth curve of 
Fig. 29 indicates the modal weight to be about 136 lb., this weight 
occurring at the ordinate of greatest length. Figure 31 seems to 
indicate a modal height of freshmen at about 68.25 in. 

16. Formula for Mode.—In order to derive a formula for 
computing the mode from a frequency table in which the modal 
class is well defined, examine Fig. 30, the rectangular histogram of 
freshman heights. The modal class (the class containing the 
mode) seems to be class 67.5-68.5. The height of the rectangle 


104 


A FIRST COURSE IN STATISTICAL METHOD 


of the next class to the right is greater than the height of the next 
rectangle to the left. It may be said, then, that the frequency of 
class 69.0-70.0 has greater weight in determining the mode than 
does the frequency of class 66.0-67.0. The use of the word 
“weight” suggests taking these two frequencies as weights and 
balancing them. 

Let 

f i = frequency of class next lower than the modal class. 
f 2 = frequency of class next higher than the modal class. 

I = lower boundary of the modal class, 
c = class interval. 

Z = mode. 

x = distance from lower boundary of the modal class to where 
the weights/i and/ 2 balance. 



Fig. 34. 


Hang weight /i at the lower boundary of the modal class and 
weight f 2 at the upper boundary of that class. The situation is 

seen in Fig. 34. 

From 0 of the scale to A of the figure equals 1. From 0 to B 
is the upper boundary of the modal class. AB = c,AC = x. 

In order that the weights fi and/ 2 shall balance at C 

/i X AC = / 2 X CB , 


or 


Solving for x gives 


fix = / 2 (c - x). 



Then 


Z = 1 + 


fiC. 


f 1 +/2 

17. Illustrations.—Applying the formula in Table XIV: 


fx = H5. 

/ 2 = 148. 

I = 67.25. 




A VERAGES 


105 


c = 1.5. 

148 X 1.5 

* 115 + 148* 

222 

X “ 263 
a: = 0.84. 

Z = 67.25 + 0.84. 
Z = 68.09. 


This value of Z is less than that obtained graphically in Fig. 31. 
It is about 0.1 in. Of 5 of a class interval) greater than the mid¬ 
point of the modal class. This indicates a greater tendency to 
heights larger than the mid point of the modal class than to 
heights less than the midpoint of the modal class. 

By the same formula compute the modal value of freshman 
weights from Table XIII. The class of greatest frequency is 
133-143. 


/1 = 138. 
U = 129. 

I = 132.5. 
c = 11 . 


Z = 132.5 + x 
- 100 c , 129 X 1 1 

6 * -h 13g + j 29 

= 132.5 + 5.3 
= 137.8. 


The mode, 137.8, is a little less than 138, the midpoint of the 
modal class. This could be foreseen, since in Fig. 28 the next 
rectangle to the left of the modal class is higher than the next to 
the right. 

The mode is 0.2 lb. less than 138 lb., the midpoint of the modal 
class; 0.2 lb. is about two-hundredths of a class interval. This 
is much smaller than one-fifteenth, the deviation of the modal 
height from the midpoint of the modal class. This was to be 
expected since f x and / 2 are more nearly equal in the case of 
weights than in the case of heights. 

18. Modal Class Not Well Defined.—If Table XII or Fig. 27 is 
used to determine modal weight of freshman men, the question 
at once arises as to whether the two modes indicated are apparent 
or real. From what was known of weights of men when this table 
and graph were made, there is no reason for thinking there should 
be two modes. This table serves very well for illustrating a 


106 A FIRST COURSE IN STATISTICAL METHOD 

method of determining mode when the modal class is not well 
defined. This is known as the method by grouping. 


Table XXXIV. —Mode by Grouping Weight of Freshmen 

From Table XII 


CO 

CO 

jg 

o 

Midpoint of 
class X 

>> 

O 

C 

Groups 

O 

• 

o 

£ 

o 

3 

c 4 

o 

£ 

I 

II 

m 

IV 

V 

VI 

VII 

VIII 

IX 

1 

110.5 

15 

. 









2 

114.5 

12 

34 

49 



83 




3 

iO 

00 

r-H 

22 

56 


68 


112 



4 

122.5 

34 

78 



100 


162 


5 

126.5 

44 

106 

140 





197 

6 

130.5 

62 

119 


163 


219 



7 

134.5 

57 

113 



175 

236 



8 

138.5 

56 

117 

174 




A.UV 

229 


9 

142.5 

61 

116 


172 




214 

10 

146.5 

55 

97 



158 

198 



11 

150.5 

42 

82 

137 




162 



12 

154.5 

40 

65 


107 




137 


13 

158.5 

25 

55 



95 




109 

14 

162.5 

30 

44 

69 



84 




15 

166.5 

14 

29 



59 



72 


_ 

16 

170.5 

15 

28 



42 



52 


17 

174.5 

13 

23 


38 







18 

178.5 

10 








_- 


AVERAGES 


107 


Table XXXV. —Auxiliary Table 


Number of class 


uroup 

5 

6 

7 

8 

9 

10 

11 

12 

i 






NJ 



ii 


XI 

1 X| | 




hi 


1 NJ 1 NJ 

NJ 



IV | 


XJ XI 

XJ 



V I 

1 NJ 

XJ 1 NJ 

1 



VI 

XJ 1 

XI 

XI NJ 



VII 






VIII 1 


7 

7 

7 

XI 


IX 




XJ 

N 

XJ 

XJ | 



1 

4 

C 

7 

6 

4 

1 



Mode at midpoint of class 8, Z = 138.5. 


19. Mode by Grouping.—It has already been seen that 
increasing the class interval tends to smooth out irregularities 
due to the sampling. From Table XII the simplest way to 
increase the class interval is to double it by making a new classi¬ 
fication whose first class includes the first two classes of Table 
XII, whose second class includes the third and fourth classes of 
Table XII, and so on. A new classification with double the 
class interval of Table XII may also be formed by making its 
first class include the second and third classes of Table XII, its 
second class to include the fourth and fifth classes of Table XII 
and so on. If both these new classifications are formed, it may 
happen that there is included only one modal class in each. If 
these modal classes overlap, it is reasonable to suppose that the 
class of Table XII which is common to both is the true modal 
class. If they do not overlap, either there is no distinct modal 
value, or else a further widening of the class interval is necessary 
to reveal it. The next step would then be to make new classifica- 

vtt C mterval three times that of the original table, 
bie XII This may be done in three ways. The first class of 

the new classification may include the first three classes of the 

original table, it may include the second, third and fourth classes 



































108 


A FIRST COURSE IN STATISTICAL METHOD 


or it may include the third, fourth, and fifth classes of the original 
table. All three classifications should be formed. If each one 
develops a modal class and by inspection it is found that one of 
the classes of the original table is common to all three, it is quite 
likely that this is the modal class sought. It may happen, how¬ 
ever, that in considering all five of the new classifications it will be 
found that there is not one and one only of the original classes 
occurring in these new classifications more times than does any 
other. This may make the modal class still uncertain. If so, 
four new classifications may be made in the same manner, each 
with a class interval equal to four times the class interval of the 
original table. This done, there are now formed nine classifica¬ 
tions or frequency tables with class intervals greater than that of 
the original table. These nine groupings are examined as to their 
modal classes for the purpose of finding out how many times 
different classes of the original table occur in these modal classes. 
That class which occurs most often in these modal classes is 
considered to be the true modal class. Its midpoint is taken as 
the mode. 

Usually it is not necessary to carry the process so far if there 
is really a single mode. Usually it will not be of any use to carry 
the process any farther if a modal class has not been found. 

20. The Method Applied.—The process may be condensed and 
tabulated as shown in Table XXXIV. The column headings are 
self-explanatory. It is evident that the first three and the last 
nine classes of Table XII will have no appreciable effect on the 
mode and so may be omitted from Table XXXIV. 

21. Auxiliary Table.—Table XXXV is an auxiliary table for 
making it easy to pick out the class of the original classification 
which occurs most times in the modal classes of the new classi¬ 
fications. In following down the column of group I, the first 
new classification, a maximum frequency of 116 is found, which 
includes classes numbered 9 and 10 of the original classification. 
These two classes are then checked off in Table XXXV with group 
I. Similarly for group II, the maximum frequency of 119 
includes classes 6 and 7 of the original classification. These 
are checked off with group II in Table XXXV. The checking o 
classes in the maximum frequencies for each of the groups is 

continued to the end of the table. , 

When the first two groupings are completed, it is found that e 

two classes of maximum frequency do not overlap. T e rs 


AVERAGES 


109 


includes classes 9 and 10 while the second includes classes 6 and 7. 
So no modal class is determined. When the next three groupings 
are completed and checked off in Table XXXV, it is at once seen 
that the class 8 is common to all three. It is quite likely then 
that class 8 is the modal class. If all five of the groupings are 
considered, however, it is seen that classes 7, 8, and 9 each occur 
in three of the maximum frequencies of these groupings. Classes 
C and 10 each occur in two of the maximum frequencies. It 
still seems likely that the mode is the midpoint of class 8. By 
continuing the process of new classifications to form more group¬ 
ings and by checking the classes of maximum frequency in Table 
XXXV, it is found that class 8 occurs in 7 of the maxima. This 
is a greater number of times than any other class occurs. It is 
then concluded that class 8 is the modal class and its midpoint, 
138.5, is regarded as the mode. Note the totals of number of 
times classes 5 to 11 inclusive occurred in the maximum frequen¬ 
cies of the groups. They are 1,4, 6, 7, 6, 4, 1, shown in the last 
line of Table XXXV. The symmetry of arrangement of these 
numbers strengthens the conclusion that the midpoint of class 8 
is the mode. Such symmetry could not be expected to always 
occur. 

There is no doubt now that the two modes displayed in Fig. 27 
were apparent only. In drawing the smooth curve to show gen¬ 
eral tendencies the two high points should be smoothed out. 

22. Mode from the Ogive.—The mode may be determined 
graphically from the ogive by determining where the curve is 
steepest. At that point drop a perpendicular to the base line. 
The scale mark where this perpendicular strikes the base line is 
the value of the mode. The steep portion of the curve is usually 
so nearly straight that a single point where steepness is greatest 
is not easy to determine. For this reason this is not a very satis¬ 
factory way of determining the mode. It may be done fairly 
well by beginning on the lower part of the curve, moving a straight 
edge along the curve, and keeping it tangent to the curve. It 
rotates counterclockwise. Mark the point on the curve where 
the tangent ceases to rotate. Then do the same way, beginning 
on the upper part of the curve, locating another point where the 
tangent ceases to rotate. Halfway between the two points should 
be close to the point where the curve is steepest. Doing this in 
Fig. 32 gives the mode as about 140 lb. 


110 


A FIRST COURSE IN STATISTICAL METHOD 


23. Comparison of Results.—Compare, now, the values of 
the mode as obtained by the different methods. 

Graph (Fig. 28), about 137.5. 

Formula (Table XIII), 137.8. 

Grouping (Table XXXIV), 138.5. 

Ogive (Fig. 32), 140. 

24. Effect of Shifting Class Limits.—The first three of these 
values are not very different. The question may arise as to 
which is the best value. That at once brings up the question as 
to what difference it would make if the class limits of Table XIII 
were shifted. With a class interval of 11, there are 11 different 
possible numbers with which to begin the first class. One, for 
example, would be to make the limits of the first class 93 to 103. 

Turning to Table II, Chap. I, the resulting frequency table 
can at once be made from the first and last columns. The result 
is Table XXXVI. 


Table XXXVI. —Weights of Freshman Men 


Classes in pounds 

X 

Frequency 

/ 

93-103 

3 

104-114 

21 

115-125 

84 

126-136 

145 

137-147 

159 

148-158 

108 

159-169 

60 

170-180 

34 

181-191 

8 

192-202 

4 

203-213 

2 

214-224 

1 


N = 629 

The modal class is indicated as 137-147. 

„ 1or , , 11 X 108 
Z 136.5 + 145 + 108 

= 136.5 + 4.7 = 141.2. 


This result is considerably larger than any of the values pre¬ 
viously found. The mode, then, may be affected by the position 


A VSRAGES 


111 


of class limits. What position of class limits shall be used? By 
tabulating and plotting frequency polygons for five different 
positions of class limits, ranging over the different possible posi¬ 
tions, it was judged that the classes as given in Table XIII seemed 
to be most fairly representative. The mode was computed for 
each one. The values ranged from 135.9 to 142.0. The average 
of the five values was found to be 139.3. 

25. Effect of Position of Class Limits in Grouping.—Treating 
each group of Table XXXIV as a frequency table and computing 
the mode by formula, the values range from 133.3 to 143.9 
with an average of 138.7. This average is very close to the value 
obtained by the method of grouping, as would be expected. The 
value obtained by grouping, being an approximate average 
of results from a systematic group of frequency distributions, 
would seem to be a reasonable result. 

The extreme values came from groups I and II. Examination 
of Table XXXV would show this. Neither of these groupings 
would have given a fair representative frequency table for use in 
statistical analysis. 

Having once decided on the best position for class limits, if a 
decided modal class appears, the method by formula or by graph 
would ordinarily be used. 

26. Effect on Arithmetic Mean.—At this point it may be of 
interest to see whether the position of class limits of Table XXXVI 
makes much difference in the arithmetic mean as computed from 
the frequency table. The computation can be carried out by 
adding an fX column. If this is done, 2X is found to be 89,461, 
giving the arithmetic mean as 142.24. From Table XXXI it 
was found to be 142.25. There is no appreciable difference. 
This justifies the assumption in this case that the midpoint of 
a class may be used as the average of the items in that class. 

27. Median.—If a military company is lined up, company 
front, with the shortest man at one end, the next taller next, and 
so on up to the tallest man at the other end, the men are arrayed 
as to height. The height of the man at the middle of the array is 
the median height of the men in the company. It is called the 
inedian man height of the men in the company. If there is an 
even number of men in the company, there is no middle man. 
Then the median may be considered as the arithmetic mean of 
the heights of the two middle men. It is a comparatively easy 
matter to array, count up, and find the median of a set of 


112 


A FIRST COURSE IN STATISTICAL METHOD 


values of a variable, provided the total number N of values is 
not large. 

28. More Precise Definition—If N is large, the items may be 
tabulated in a frequency table, and that class between whose 
boundaries the median lies is easily determined from the column 
of cumulative frequencies. For example, use Table XIII of 
weights of freshman men. 

N = 629. 

N 

= 314.5. 

From the column of cumulative frequencies it is seen that the 
first three classes contain 196 items, and the first four classes 
contain 358 items. 

196 < 314.5 < 358. 

The median lies, therefore, between the boundaries of the 
fourth class, 

132.5 < M < 143.5. . 

Think of all the items of the table as being put in array, each 
item occupying a unit segment of the base line on which it stands. 
At points on the base line whose abscissas are equal to the cumula¬ 
tive frequencies, plot the respective values of the class boundaries 
as ordinates. Connect the top of these ordinates by straight lines 
from each to the next, forming a graph of the array. Then on 
the assumption that within any class there is uniform variation 
of size of item throughout that class, the ordinate to the graph at 
the midpoint of any unit segment of the base line will measure the 
size of item standing on that unit segment. The entire length 
of base line occupied by the array equals N. The point halfway 
through the array will be at the scale mark N/2. The value of 
the ordinate at this point is defined as the median. 

For a small number of items, such as the man heights of the 
military company, ordinates may be drawn at the midpoints of 
unit segments of the base line representing each man height, and a 
smooth curve drawn through the tops of these ordinates. The 
median is the value of the ordinate to this curve at the midpoint 
of the base line used. As an illustrative example, in Fig. 35 are 
arrayed the heights of a squad of 16 men selected at random 
throughout the company. The 16 ordinates measuring the 
heights occupy 16 units of base line. A curve is drawn through 
the tops of these ordinates. Halfway through the array at scale 



AVERAGES 


113 


Ml 


l 3 4 5 6 1 & 9 10 II 12 13 14- IS 16 

Uni+s of Base-Line 

Fig. 35.—Array of heights of sixteen soldiers. 




Umfs of Base-Line 

l °* 36 -“ Gra P h of median class of freshman weights for determining formula for 

the median. 





114 


A FIRST COURSE IN STATISTICAL METHOD 


mark 8, the ordinate 8.4 is drawn. Its length, 5.7 ft., is the value 
of the median man height of the squad. 

29. Application to Frequency Table—For the median fresh¬ 
man weight, it is not necessary to graph the entire array. Only 
the class in which the median lies need be used. As has just been 
seen, the lower boundary of this class is 132.5 and the upper 
boundary, 143.5. The number in the class is 162. Proper 
scales are selected and the graph drawn for this class only (Fig. 
36). There are 196 items up to this class. An ordinate equal to 
132.5, the lower boundary of the class, is drawn at 196 on the 
base line. An ordinate equal to 143.5, the upper boundary of the 
class, is drawn at 358 on the base line. The upper ends of these 
ordinates are connected by the straight line AD. At 314.5, the 
point halfway through the entire array, the ordinate measuring the 
median is drawn, meeting the line AD at B. Through A a hori¬ 
zontal line is drawn cutting the ordinates to B and D at the 
points C and E, respectively. Through B a horizontal line is 
drawn meeting the ordinate to D at F. 

30. Interpolation Formula for Median.—Use the following 
notation: 

M = median. 

N = number of items in entire array, entire length of 
base line used. 

I = HA = lower boundary of the class used. 

V = GD = upper boundary of the class used. 

c = ED = class interval. 

f = AE = frequency of the class. 

Fm = cumulative frequency up to the class. 

F'm = cumulative frequency down to the class from the 
upper end of the array. 

i = AC = N/2 - Fm = number of units of base line 
that must be passed from the lower boundary of the 
class to reach the median. 

i' = BF = N/2 — F'm = number of units of base line 
that must be passed from the upper boundary of the 
class to reach the median. 

x = CB = amount to be added to l to give M. 

x' = FD = amount to be subtracted from V to give M. 

From the frequency table: 

N = 629. 


AVERAGES 


115 


V = 143.5 
c = 11. 

/ = 162. 

F u = 196. 

F'm = 271. 

^ = 629 _ 196 = 3H 5 _ 196 = 118 5 , 
z 

A9Q 

i' = _ 271 = 314.5 - 271 = 43.5. 

z 

From the similar triangles ACB and A ED in the figure, 

CB = AC 
ED AE 


Substituting from the above notation 

x i 



M = l + x> 

M = l -v j- 

This is the formula for the median interpolating from the 
lower boundary up. 

Also from the similar triangles BFD and A ED 


whence, 


and 

or 



the formula for the median interpolating from the upper bound¬ 
ary down. 


116 


A FIRST COURSE IN STATISTICAL METHOD 


Using the first formula and the values from the frequency table, 


M = 132.5 + 


11 X 118.5 
162 


M = 132.5 + 8.05, 
M = 140.55. 


Or, from the second formula, 

M = 143.5 - 


11 X 43.5 f 
162 


M = 143.5 - 2.95, 
M = 140.55. 


The two formulas, which are consistent, must give the same 
result. Use of both gives a check on the work. 

31. Example.—The median freshman height is computed 
from Table XIV. 


N = 629. 

I = 67.25. 

V = 68.75. 
c = 1.5. 
f = 156. 

F* = 210. 

F u ' = 263. 

i = 314.5 - 210 = 104.5. 
i' = 314.5 - 263 = 51.5. 


Then 


M = 67.25 + 


1.5 X 104.5 
156 


M = 67.25 + 1.01, 
M = 68.26, 


or 


M = 68.75 — 


1.5 X 51.5 
156 


M = 68.75 - 0.49, 

M = 68.26. 

32. Median from the Ogive—The median may be at once 
determined graphically from the ogive by bisecting the total heig t 
of the curve. From the point of bisection draw a horizontal line 
to the curve. From the point where it meets the curve, drop a 
vertical line to the base line. The reading of the horizontal 
scale at this point is the value of the median. In this manner 



AVERAGES 


117 


from Fig. 32 the median freshman weight is found to be about 
140% lb. From Fig. 33 the median height is about 68% in. 
These results are very close to the values that were computed 
by interpolation formula. These were 140.50 lb. and 68.26 in. 

33. Theorem.—If a set of values of a variable be placed in 
array and the median is taken as that value which stands in the 
middle of the array, the following theorem may be proved: 

The sum of differences from the median, dropping algebraic 
signs , is a minimum. 

This means that if the differences between the median and each 
of the other items be added, the sum will be less than the sum of 
the differences from any other item. 


Suppose the items OM u OM 2 , . . . OM N , 

0 M x M 2 M 3 .M<M s Me AT 7 M N 

__* I i 

all laid off from a common origin 0. Choose any one, OM A , from 
which to take differences. Take any two items, one less than 
OM 4 and one greater than OM i} say OM 2 and OM r . 


OM< - OM* = M*JU 
OM 7 - OM 4 = M<M 7 

adding, M 2 M A + MMi = M 2 M 7 = OM 1 - OM*. 

The sum of these two differences from OM t equals the differ¬ 
ence of the two items taken. 

Take any two items each less than Oil/ 4 , say OM x and OM 2 . 


OMi - OM x = M X M 4 , 
OM 4 - 0M 3 = M*M <. 

Since 


But 


M X M< > M xM 3 , 

M x M a + M Z M< > M x M 3 . 


MiMi = OM 3 - OMx • 

The sum of these two differences, therefore, is greater than the 
difference of the two items taken. 

Take any two items each greater than OM,, say OM „ and OM,. 


OM 5 - OM, = M,M b , 
OM, - OM ,= M,M,. 


118 


A FIRST COURSE IN STATISTICAL METHOD 


Since 

M \M 6 > M 5 M g, 

AM/ 5 + M<M 6 >M b M 

But 

M 5 JU 6 = OM 6 - 0M b . 


The sum of these two differences, therefore, is greater than the 
difference of the two items taken. It is seen, then, that if the 
items are taken in pairs, one item in each pair being less than 
OM 4 and the other greater than 0 M h the sum of all differences 
will be less than if in some pairs both items are less than 0 M 4 or 
both greater than Oil/ 4 . So in order to make the sum of the 
differences a minimum, OJl / 4 must be chosen so that all the other 
items may be paired off with one item of each pair less than 0 M\ 
and the other greater than OM 4 . The number of items less than 
0 M 4 must, then, equal the number of items greater than 0 M 4 . 
In other words, Oil / 4 must stand in the middle of the array and 
is the median. 

If there are an even number of items in the array, any value 
between the values of the two middle items will satisfy the 
theorem. The most reasonable value to use in this case is the 
arithmetic mean of these two items. 

The same theorem applies to the more precise definition of the 
median as an ordinate. If N is odd, the ordinate measuring the 
median will coincide with the middle item of the array, and if 
N is even, it will fall halfway between the two middle items. 

34. Median is an Average.—The median satisfies the general 

definition of an average. 

If Xi = X 2 = X 3 = . . . = X N = k, 

and the items be put in array with a line drawn connecting their 
tops, the line will be straight, and parallel to the base line. The 
ordinate at the middle of the array will equal k. Therefore, the 

median equals k, and equals each one of the items. 

35. Three Averages.—The three constants, arithmetic mean, 
mode, and median, are of frequent use as averages in statistics 
analysis. They are each used as type forms, or central value 
around which all the values of the variable group themselves m 

or less closely. 

No matter how carefully samples may be selected from gi 
universe, the average is quite likely to vary somewhat from sample 


AVERAGES 


110 

to sample. That form of average would be considered most 
representative for which the differences from sample to sample 
are least. 

36. Each a Type Form—The sum of the deviations from the 
arithmetic mean is zero, warranting its use as a central type form. 
The sum of the differences between the median and each of the 
items is a minimum. This makes the median what may be 
called the size of item of closest fit to the entire set. It is thus a 
central type form. The mode is the size of item of most frequent 
occurrence, which makes it also a type form. 

37. Median from Frequency Curve.—Since the area under the 
frequency curve from the left end up to any ordinate is propor¬ 
tional to the total frequency up to that ordinate, it follows that 
the ordinate which bisects the area stands on the base line at the 
value of the median. In other words, the ordinate at the median 
passes through the center of area. 

38. Arithmetic Mean from Frequency Curve.—Take Fig. 28, 
the rectangular histogram of freshman weights, and note the 
geometric interpretation of the process of computing arithmetic 
mean as shown in Table XXXI. If the width of each rectangle is 
regarded as a unit, then the values of / in the table represent the 
areas of the corresponding rectangles. The values of X are the 
distances of the centers of the rectangles from the vertical axis 
of reference. N = 2/, the total frequency, is the entire area. 
fX for any rectangle equals the moment of its area about the 
vertical axis. 



From physics it is known that the sum of the moments of the 
rectangles about any axis, divided by the sum of all the rec¬ 
tangles, gives the distance of the center of gravity of the entire 
area from that axis. It follows that the ordinate at the point a 
of the base line goes through the center of gravity of the area. 
Since the area under the smooth curve is the limit of area of the 
sum of the rectangles as the width of the rectangles becomes 
indefinitely small, the ordinate through the center of gravity of 
the smooth curve should be at the arithmetic mean of the uni¬ 
verse of values from which the sample was taken. 

The arithmetic mean and the median are then central values 
from a geometric standpoint. 


120 


A FIRST COURSE IN STATISTICAL METHOD 

Geometrically, the mode is located at the point where the great¬ 
est ordinate to the curve stands, the ordinate where the tangent 
to the curve is parallel to the base line. 

39. Geometric Properties Serve as Checks.—These geometric 
properties of the arithmetic mean and the median and mode 
may be used to check the correctness of smoothing the frequency 
diagram. See if the ordinate at M bisects the area under the 
curve, and if the ordinate at a goes through the center of gravity 
of the area. This latter may be ascertained by drawing the curve 
on stiff paper or cardboard. Cut out the figure drawn. Lay it 
on a knife edge with the base line perpendicular to the knife edge, 
and find where it balances. At this point on the base line read the 
value of the arithmetic average as given by the curve. See if it 
checks with the computed value. See if the greatest ordinate 
to the curve stands on the base line at the computed value of 
the mode. 

40. One Average May Be More Representative than Another. 
Sometimes one average is more representative than another. 
It was shown on page 102 that the arithmetic mean of the height 
of trees in a dooryard was not truly representative. From the 
table of heights it is at once seen that the median is 50 as is also 
the mode. This value of 50 is more representative of these tree 
heights than is 75, the arithmetic mean. 

41. Advantages and Disadvantages of Each.—The size of the 
arithmetic mean is influenced by the sizes of extreme items. 
The median is not influenced by extreme items except in so far 
as items of extreme size be added to the set of items. Since iV 
is then changed, the position of the median in the array is changed. 
'The items in the central portion of the array are usually all of 
nearly the same size. So the size of the median would be changed 
but little if any. The mode is not at all affected by extreme items 
unless many of them are introduced. 

The arithmetic mean and the median can be determined quite 
closely by formula. Sometimes it is difficult to determine a 
mode. The arithmetic mean and the median always exist. 

Sometimes there is no mode at all. 

The product of arithmetic mean and number of items gives the 
totality of all items. There is no such simple relation for either 
mode or median. 

The median is not so much affected by accidental variations as 
is the arithmetic mean. A low median wage almost surely means 


AVERAGES 


121 


a group of low wages. Suppose, for example, in a certain shop 
c there are 100 men at wages of S2 per day, and three men getting 

S100 each per day. The arithmetic mean wage is nearly $5 per 
i, day, not a small wage. The median wage is $2, which is more 

r ; representative of wages in that shop, and indicates a low wage 

s scale. 

h There are some characteristics of certain variables that are hard 

it to measure numerically and yet the variates may be ranked, or 

i arrayed, with respect to the given characteristic. In such a case 

* the arithmetic mean cannot be found, but the variate having the 

median characteristic is determinable. 

, It sometimes happens that either the median or the arithmetic 

i mean may occur at a point in the distribution where there are 

few or no items. If there is a mode, it may then be a better repre¬ 
sentative type form. 

If the frequency curve has a symmetric bell shape, the arith¬ 
metic mean, the mode, and the median are equal to each other. 

This can be seen at once from the frequency curve and the geo¬ 
metric properties above stated. 

42. Weighted Arithmetic Mean.—A weighted arithmetic mean 
is an arithmetic mean obtained when each of the items or some 
of them have been counted more than once. 

In Table XXXI the column of “size of item” contains 11 num- jp loo 
bers. In obtaining the arithmetic mean, the first size, 105, 
was counted 15 times, as shown by the corresponding frequency. 

The second size of 116 was counted 43 times, and so on down the 
table. In other words, the size at the midpoint of each class was] 
weighted by the corresponding class frequency. To get the arith- ^ 
metic mean the products fX, of size by weight, were added, and 
the sum divided by N, the sum of the weights. The result is 
the weighted arithmetic mean of the sizes X in the first column 
of the table. 

43. Rule.—The process gives the rule for finding weighted 
arithmetic mean. Multiply each value of the variable by its 
weight, add the products, and divide by the sum of the weights. 

If Xi, Xz, Xz, . . . Xff are the values of the variable and 
^1, ^2, w 3 , . . . w N are the corresponding weights, then the 
weighted arithmetic mean equals 

W\X x -f w 2 X 2 -f w 3 X z + . . . 4- w N X N __ XwX 
u>i + Wz -f w 3 + . . . 4- w N 2 w 



122 


A FIRST COURSE IN STATISTICAL METHOD 


44. Weighted Observations.—Sometimes in making a set of 
observations, certain ones of them are more reliable or accurate 
than others. Then in getting the average of the set, weights are 
given to the different observations according to their reliability 
and a weighted mean found. For example, a surveyor might 
measure the length of a line by pacing, by using a yardstick, or by 
using a steel tape. The yardstick measurements, if done care¬ 
fully, should have greater weight than the results by pacing. 
The steel-tape measurements should be much better than those 
by either of the other methods and so be given much greater 
weight. When the weights are decided upon, a weighted arith¬ 
metic mean of all measurements would be used as the most 
probable value of the length of the line. 

45. Weighted Grades.—When a student is graded in a certain 
subject, his final examination grade may be given a weight of 1, 
his recitation work a weight of 2, and his laboratory work a 
weight of 3. Suppose his final examination mark is 60, his reci¬ 
tation grade is 80, and his laboratory work is 95. His recorded 
grade will be the weighted arithmetic mean, 

60 X 1 + 80 X 2 + 95 X 3 
1+2+3 
505 
6 

= 84K- 

The grade of 95 is not as much above 80 as 60 is below it; yet 
its greater weight brings the final average considerably above 80. 

46. Geometric Mean.—Another average of not uncommon use 
is the geometric mean. The geometric mean of n quantities is the 
nth root of their product. The geometric mean of X if X 2> 

. . . Aw is 

y/XiXtK, . . ■ A». 

The G. M. of 4 and 9 is 

V4”X9 = 6. 

The G. M. of 2, 4, and 8 is 

V / 2 X 4 X 8 = 4. 

The G. M. of 2, 4, 8, 16, and 32 is 

-C/2 X 4 X 8 X ib X 32 = 8. 

Write the series 2, 4, 8,16, 32 

as 2 l ,2 2 ,2 3 , 2\ 2 6 . 



AVERAGES 


123 


This is a geometric progression with common ratio of 2. 1 he 

exponents of 2 form an arithmetic progression with common 
difference of 1. In general, a geometric progression of n terms is 

written 

a, ar, ar 2 , ar 3 , . . . ar n ~ x . 

The exponents of r, 

0, 1, 2, 3, . . .71 1, 

always form an arithmetic progression. 

The G. M. of Xi, X 2 , X 3 . • • Xn is 

"V X i A " 2 X 3 . . . 

T ^ ^ log X\ + log X 2 + log X* + . . . + l°g X *. 
Log G. M. -- N 

That is, the logarithm of the G. M. of a set of numbers is the 
A. M. of their logarithms. This makes a ready method of com¬ 
puting the G. M. of a set of numbers. Take the arithmetic mean 
of their logarithms and look up its antilogarithm. 

47. Connection between G. M. and A. M.—These con¬ 
siderations show an intimate connection between geometric 
mean and arithmetic mean. To any property of the G. M. there 
is a corresponding property of the A. M. The G. M. applies to 
a series of terms that are to be multiplied together, as A. M. 
applies to a series of terms to be added together. G. M. may be 
applied to interpolation when there is a uniform rate of change, 
and A. M. when there is a uniform absolute change. G. M. 
applies to compound interest, as A. M. applies to simple interest; 
G. M. to a series of relative prices, and A. M. to a series of price 
increases. 

48. Illustration from Interest.—The compound amount of $1 at 
5 per cent compounded annually for 10 years is (1.05) 10 = SI.63. 
The compound amount for 20 years is (1.05) 20 = $2.65. The 
compound amount for 15 years, the middle of the period from 
10 to 20 years, is G. M. of SI.63 and $2.65 

= Vl.63 X 2.65 
= V4.3195 
= 2.08 
= (1.05) 15 . 

G. M. of (1.05) 10 and (1.05) 20 is 

V(1.05) 10 X (1.05) 20 
= V (1.05) 30 
= (1.05) 16 . 


124 


A FIRST COURSE IN STATISTICAL METHOD 


The simple amount of SI at 5 per cent per annum for 10 years is 

1 + 10 X .05 = 1.50. 

For 20 years, 

1 + 20 X .05 = 2.00. 

The A. M. of SI.50 and S2 gives the simple amount for 15 
years, the middle of the period from 10 to 20 years. 

1.50 + 2.00 , 


Simple amount for 15 years is 

1 + 15 X .05 = 1.75. 

49. Average Rate of Growth.—Suppose the population of a 
certain city is 100,000 in 1890. The census of 1900 shows a 12X 
per cent increase. The census of 1910 shows a 50 per cent 
increase. The census of 1920 shows a 100 per cent increase. 
What has been the average per cent increase per decade ? 

The average per cent increase applied each decade would give 
the 1920 population the same as given by the above census 

figures. The census figures give 


Date 

1890 

1900 

1910 

1920 


Population 

100,000 

112.500 
168,750 

337.500 


100,000(1.125 X 1.50 X 2.00) = 337,500. 

Let x per cent be the average rate, then 

100,000{(1 + x) X (1 + x) X (1 + x)} = 337,500, 
100,000(1 + z) 3 = 337,500. 


Then 

(1 + z) 3 = 1.125 X 1.50 X 2.00, 

1 + X = ^025 X 1.50 X 2m 
1 + x = ^3.375, 


1 + x = 1.50, 

x = .50 = 50 per cent. 

The required average per cent increase is x - 50 per cent p 
decade. It is the G. M. of 1.125,1.50, and 2.00. 

60 Interpolation. -Suppose the population of a certa y 

1910 was 100,000 and in 1920, 144,000. What was the popular 
tion in 1915? 







AVSRAGES 


125 


If the population increased the same number of people each 
year, the population in 1915 was the A. M. of 100,000 and 144,000, 

or 

100,000 + 144,000 __ 122,000. 

If the population each year increased at the same per cent of the 
population at the beginning of the year, then the population in 
1915 was the G. M. of 100,000 and 144,000, or 

Vl00,000 X 144,000 = 120,000. 

The two ways of increase are shown graphically in Fig. 37. 
The straight line ABC is the population graph if the change 



o io o 

~ — ou 

22 a> <r> 


Da+e 

ABC- Uniform Increase 
AB'C - Uniform Raft of Increase 

Fio. 37.—Increase of population. Interpolation. 

from year to year was the same amount. The curve AB'C is the 
population graph if the per cent increase was the same each 
year. The ordinates to ABC at the different dates form an 
arithmetic, or constant difference, progression. The ordinates 
to ABC at the different dates form a geometric, constant-ratio 
progression. In dealing with differences, an arithmetic mean 
is likely to be useful; in dealing with ratios, a geometric mean. 



126 


A FIRST COURSE IN STATISTICAL METHOD 


61. G. M. is an Average.—The geometric mean satisfies the 
general definition of an average. 




G. M. = y/XiXiXz . . . X„ = %/F = h, 


equals each of the items. 

62. Harmonic Mean.—Another kind of average occasionally 
used is the harmonic mean. The harmonic mean of a series of 
numbers is the reciprocal of the arithmetic mean of their recipro¬ 
cals. In algebraic symbols, 


H.M. of X lt X it X t9 . . . X N 



This reduces to 


N 



The harmonic mean of a set of items may be defined as the 
number of items divided by the sum of their reciprocals. 

63. Illustrations.—Suppose a man in traveling 3 miles goes 
the first mile at the rate of 8 miles per hour, the second mile at the 
rate of 10 miles per hour, and the third mile at the rate of 12 miles 
per hour. What is his average speed in miles per hour? 

The first mile takes % hr., 

The second mile takes Xo hr., 

The third mile takes K 2 hr., 

The three miles take + Ko + K 2 = 3 K 20 hr 
Then the average time per mile is 

Vz of 3 K 20 = 3 K 60 hr. 


So the average miles per hour is 36 %7 - 9.73. 

It is seen that the reciprocals of 8 , 10, and 12 were added, the 
sum divided by 3 , the number of items, and the reciprocal of the 
result taken. In other words, the average speed in miles pe 

hour is the harmonic mean of 8 , 10 , and 12 . 

Suppose sugar is 12 lb. for $1, flour, 20 lb. for SI, and potato^, 

25 lb. for SI, and it is necessary to find the average price per 



A V Eli AGES 


127 


pound. If the average price is obtained by using the arithmetic 
mean of' 12, 20, and 25, the result will be the harmonic mean of 
prices. Thus: 

12 + 20 + 25 _ in 
3 ~ iy 

The A. M. of the number of pounds for SI is 19; 19 lb. for $1 is 
$K9 P er pound. 

The prices per pound are: sugar, 8*1 cts., flour, 5 cts., potatoes, 
4 cts. 

The H. M. of these prices is 

3 _ 300 100 _ 1 

Kb + 1 A + 1 A 57 19 CtS - *19’ 

= 5^9 c ts. 

The A. M. of prices per pound is 

V\2 + Mo + Ms _ g> 52 
3 "900 


= 5% cts. 

If this method is used to get the average number of pounds for 
SI, the result will be: 


00 /o 2 = 17*13 lb. for SI. 

This is the H. M. of 12, 20, and 25, the respective number of 
pounds for SI. 

54. Relations between A. M., G. M., and H. M.—The arith¬ 
metic mean, geometric mean, and harmonic mean are related 
mathematically. If only two numbers, X x and X 2 , are involved, 
their G. M. is the geometric mean of their A. M. and H. M. 


A. M. of Xi and X 2 is 
H. M. of X\ and X 2 is 


Ai + A 2 
- ■ 

2 

2 

i_ + J_ 

X x ^ X 2 


2X x X 2 
X 1 + x 2 


Geometric mean of A. M. and H. M. is 


4 


r x 1 + x, 2x7x7 _ 

■ A * ■ 


2 X x + X 2 


VxjT 2 


= G. M. of Xx and X 2 . 

It has been proved that in any series of positive numbers 

H. M. <G. M.< A. M. 



128 


A FIRST COURSE IN STATISTICAL METHOD 


65. H. M. is an Average.—The harmonic mean satisfies the 
general definition of average. If 



H. M. 



equal to each of the items. 

Arithmetic mean, geometric mean, and harmonic mean are all 
used in formulas for index numbers. 1 

66 . Other Averages.—Besides these averages an infinite 
number of different averages satisfying the general definition 
might be devised. The square root of the arithmetic mean of 
the squares of a set of numbers is an example, 

IYx* 

V N 

It is called the root-mean-square of the set of numbers. In the 
next chapter the root-mean-square of deviations from A. M., 
called standard deviation, will be introduced. 


Exercises 

1. Find the arithmetic mean of the following numbers and check that the 
sum of their deviations from the mean is zero: 

17, 43, 21, 18, 104, 62, 47, 96, 117, 8. 

2. Find the deviations of the numbers in Ex. 1 from 50 and check that 
their arithmetic mean added to 50 gives the arithmetic mean of the numbers 

themselves. 

3. Using the frequency table of mean monthly temperatures that was 
made from Table IV in Ex. 2, Chap. II, find the arithmetic mean monthly 
temperature, by the direct method, for the period of time covered. 

4. Find the same mean monthly temperature by the short-cut meth . 
6. Using the frequency table of monthly precipitation that was mad 

from Table III in Ex. 3, Chap. II, find the arithmetic mean monthly precip - 
tation, by the direct method for the period of time covered. 

6. Find the same mean monthly precipitation by the short-cu 

7. Find by formula the modal monthly temperature, using q 

made in Ex. 2, Chap. II. How much does it differ from the 

1116 8. Find by formula the modal monthly precipitation, us ' ng ^ e ^!® 
made in Ex. 3, Chap. II. How much does it differ from the arithmetic 

me ^!^Find the median monthly temperature, using the table made m 

Ex. 2, Chap. II. „ 

x Prof. Irving Fisher, “The Making of Index Numbers. 


AVERAGES 


129 


10. Find the median monthly precipitation, using the table made in 

ll/ Check that the sum of the differences of the following set of numbers 
from* their median is less than the sum of the differences from some other 

number chosen at random: 

47, 63, 87, 99, 21, 27, 35, 38, 55. 

12. The arithmetic mean of a set of 20 numbers is 75. What is the sum 

of these numbers? . nA . 

13. In chemistry a student was graded 60 in final examination, 90 in 

recitation, and 80 in laboratory. These grades were weighted 1, 2, and 3, 
respectively. Find the student’s average grade. If the weights had been 
1, 3, and 6, respectively, what change would it have made in the final mark? 

What if the weights had been 1, 5, and 10? 

14. From the table of population of the state of Washington, given in 
Ex. 10, Chap. Ill, determine the average rate of increase per decade for the 
period covered by the table. 

16. Determine the average increase in population per decade in the state 
of Washington for the same period of time. 

16. The number of bacteria in a certain culture was found to be 4 X 10* 
at noon of one day. At noon the next day the number was found to be 
9 X 10*. If the number increased at a constant rate per hour, how many bac¬ 
teria were there at midnight? 

17. If a man travels 6 miles at a speed of 4 miles per hour for the first 3 
miles, 3 miles per hour for the next 2 miles, and 2 miles per hour for the last 
mile, what is his average speed in miles per hour? 

18. A man has S10,000 invested at 5 per cent simple interest, $5,000 at 
6 per cent, and $1,000 at 10 per cent. What is his average rate of interest? 

19. What relation must exist between a and b in order that their arith¬ 
metic mean, geometric mean, and harmonic mean may all be equal? 

20. At the end of 5 years a certain sum of money placed at 6 per cent per 
annum, compounded annually, amounts to SI,338.23. At the end of 15 
years it amounts to $2,396.56. What is the amount at the end of 10 years? 

21. At the end of his first quarter in college a freshman had credits as 
follows: 5 hours of trigonometry with a grade of 97, 4 hours of English with 
a grade of 88, 3 hours of history with a grade of 85, and 4 hours of physics 
with a grade of 89. What was his average grade per hour of credit? 

22. In a certain park there are 4 alder trees each 45 ft. high, 10 maples 
each 50 ft. high, 2 hemlocks each 200 ft. high, and 4 Douglas firs each 250 
ft. high. What is the average tree height of the trees in this park ? What 
kind of average do you think is fairest to use? Is it fair to group these tree 
heights in one group? 

23. The population of a certain city increased in 10 years from 335,000 to 
495,000. What was the average increase per year? What was the average 
annual rate of increase? 

24. Find from Table XXIII the arithmetic mean, mode, and median leaf 
length. Arrange them in order of size. 

26. Find from Table XXIV the arithmetic mean, mode, and median leaf 
b readth. Arrange them in order of size. 


130 


A FIRST COURSE IN STATISTICAL METHOD 


26. In Table XV what was the average number of heads per throw? Is 
the arithmetic mean a good average to use? What is the significance of the 
fractional value in the arithmetic mean? 

27. From the table made in Ex. 19, Chap. II, determine the arithmetic 
mean, mode, and median leaf length. 

28. From the table made in Ex. 20, Chap. II, determine the arithmetic 
mean, mode, and median leaf breadth. 



CHAPTER V 


DISPERSION 

Two frequency distributions may be alike in every respect 
except that the average of one is greater than the average of the 
other. Such are the distributions shown by the curves in Figs. 
38 and 39. Such distributions may be compared by comparing 
their averages. 

Two frequency distributions may have exactly the same aver¬ 
age, but the variation of one with respect to the average may 



be quite different from that of the other. The two curves in Fig. 
40 show such distributions. For many purposes sufficient infor¬ 
mation is not given for comparison of these variables by saying 
that they have the same average. Some measure of the varia¬ 
tion should be devised by means of which the two variables may 
be compared. 




Two frequency distributions may have different averages and 

also different variation with respect to the average. The curves 

of Fig. 41 show two such distributions. In comparing these tw r o 

variables both the average and the measure of variation must be 
considered. 

1. Dispersion. It thus appears that, in order to compare two 
variables, the type form or average is not sufficient to make the 
comparison complete. It is necessary to know to what extent 
the values of the variable tend to concentrate about the average, 
or, putting it in other words, to what degree they vary or scatter 

131 


132 


A FIRST COURSE IN STATISTICAL METHOD 


from the average. This deviation or “scatteration” of the items 
is called dispersion. The closer the items are to the average the 
less the scatteration, the less the dispersion. The more scattered 
away from the average the items the greater the dispersion. 
It is desired, of course, to get a numerical measure of dispersion. 

2. Range as a Measure.—Just as there are various kinds of 
type form or average, there are also various measures of disper¬ 
sion. The first measure that may suggest itself is the range or 
difference between the largest and the smallest items. Consider 
the four groups of values given in Table XXXVII. 


Table XXXVII.—Groups of Values of Four Variables 


III 




C12 

615 

619 

621 

623 

627 

630 


4,347 34 4,347 40 


621 


4.86 621 5.71 


Certain constants for each group are tabulated in Table 

XXXVIII. 


Table XXXVIII.— Constants of the Four Groups 


Group 


I 

II 

III 

IV 


Range 

Average, a 

Range Average 

s 

2 

69 

Ho 

0.57 

18 

69 

l Ho 

5.71 

18 

621 

Ho 

4.86 

18 

621 

Ho 

5.71 


6 + a 


0.008 

0.083 

0.008 

0.009 


In Group I the range is 2. In group II it is 18. It seems quite 
evident, then, that there is much greater scatteration, dispersion, 

































DISPERSION 


133 


in group II than in group I. Groups II and III have the same 
range. In group II, however, the range of 18 is with an average 
of 69, while in group III the range of 18 is with an average of 621. 
Manifestly, a difference of 18 among things which average 621 
is much less noticeable than a difference of 18 among things which 
average only 69. If the difference in height between the tallest 
man and the shortest man in a military company is 2 in., it would 
be said that the height of men in that company is very uniform. 
There is hardly any variation in height. If it is found that the 
difference between the longest nose and the shortest nose of the 
men in the company is 2 in., it would be said that there is very 
great variation in length of noses. 

3. Relative Values.—Unless the variables compared have 
about the same average, the range must be compared with the 
average in each case to give a fair idea of the degree of variation. 
The fourth column of Table XXXVIII gives the ratio of range to 
average for each of the four groups. In groups I and II, having 
the same average, these ratios will have the same ratio to each 
other as did the ranges. One is nine times the other. In groups 

1 and III the ratios of range to average are the same. A range of 

2 in things which average 69 is as much, relatively, as a range of 
18 in things which average 621. In groups III and IV, having 
the same range and the same average, the ratios of range to aver¬ 
age are equal. 

The range has been used here as a measure of dispersion. The 
ratio of range to average would be called a coefficient of dispersion. 
Groups II, III, and IV have the same measure of dispersion, 18, 
which is nine times that of group I. Groups I, III, and IV 
have the same coefficient of dispersion, which is one-ninth 
that of group II. 

4. Range May Not Be a Good Measure.—A little considera¬ 
tion will reveal the fact that range is not a very good measure of 
dispersion. Suppose that in a military company the tallest man 
is 70 in. tall and the shortest 68 in. Let these two men be replaced 
by men 71 in. and 67 in. high, respectively. The range is at once 
doubled, changing from 2 to 4 in. The average height has not 
been changed. The ratio of range to average has been doubled. 
The dispersion certainly has been altered but little, though both 
the measure of dispersion and the coefficient have been doubled. 
If the heaviest man in a town weighs 160 lb. and the lightest 
weighs 140 lb., the range in weight is 20 lb. If, now, a fat man 



134 A FIRST COURSE IN STATISTICAL METHOD 

weighing 260 lb. moves into the town, the range is suddenly 
increased to 120 lb., six times what it was before. The average 
weight would be increased but little. Taking the town as a whole, 
the variation in weights has changed but little. In a sample of 
leaves from a tree, the difference between the lengths of the longest 
and shortest leaves may be 32 mm., as in Table XX. It is not 
unlikely that there are a few stray leaves on the tree shorter than 
any in the sample or a few longer than any in the sample. A second 
sample might have a much greater range than the first, and yet 
both samples be good representations of the entire universe from 
which they were selected. A measure so seriously affected by 
what may be called accidental variations can hardly be con¬ 
sidered a good measure. 

6 . Deviations from the Average.—To get the coefficient of dis¬ 
persion the range was compared with the average. The deviation 
of the extreme item from the average might have suggested itself 
as a measure of dispersion. It is at once seen that there is the 
same objection that there is to the range. If, however, the 
deviation of each item from the average can be considered in 
the measure of dispersion, a much more satisfactory measure 
will result. This may be illustrated by the items in groups III 
and IV of Table XXXVII. In group III it is seen that the next 
item larger and the next smaller than the average, 623 and 619, 
differ from the average a less amount than do the corresponding 
items, 625 and 617, in group IV. The same is true for the next 
items beyond these. Though the largest and smallest items in 
each group differ from the average by the same amount, yet it is 
evident that in the group as a whole, the items of group III cluster 
around the average more closely than do those of group I • 
The dispersion in group III is then less than that in group I 
The scatteration is greater in group IV than in group III. 

6 . Average Deviation.—The average deviation from the aver¬ 
age of the group at once suggests itself as a measure of dispersion. 
The closer the items cluster about the average, the smaller will 
be the average deviation. The more they are scattered away 
from the average, the greater will be the average deyiatiom 
In Table XXXVII the deviations from the average are tabular 
for each group. The columns are headed d a . In summing i 

deviations, their absolute values are used, since positive an 

negative deviations of the same size would have the same m 



DISPERSION 


135 


ence on the degree of scatteration. The average absolute devia¬ 
tions 8 are tabulated in Table XXXVIII. 


8 



7. Coefficient.—Groups II and IV have the same average 
deviation of 8 = 5.71. But an average deviation of 5.71 among 
things whose average size is 09 is much greater than an average 
deviation of 5.71 among things whose average size is 621. So 
again, when the average sizes differ materially, it is necessary to 
compare average deviation with average size of item to get a 
coefficient of dispersion. 

Coefficient of dispersion = 8/a . 

8 . The Average Used.—The average used from which to take 
deviations may be either arithmetic mean, median, or mode. 
Whichever is used, the corresponding subscript may be attached 
to 8. To get the coefficient of dispersion, 8 is divided by that 
average from which 8 was derived. 


. _ 2 |X - a\ 2\d a | 

- ~N~ N~ 

. . 2|* - M\ 2 |d„| 

a,,-- A — --jy— 

. _ 2 |X - z\ 2\dz\ 

- 7T = IT’ 

The corresponding coefficients are 


8a 8jt 

a ’ M 


» and 



9. Examples from Frequency Tables.—Working from a fre¬ 
quency table, let it be required to find 5 a and 8 U for freshman 
weights from Table XIII. 



136 


A FIRST COURSE IN STATISTICAL METHOD 


Table XXXIX.— Average Deviation from Arithmetic Mean of 

Freshman Weights 


Size of item, in pounds 

X 

Frequency 

/ 

A 

d a 

fd a 

105 

15 

1,575 

-37.25 

- 558.75 

116 

43 

4,988 

-26.25 

-1,128.75 

127 

138 

17,526 

-15.25 

-2,104.50 

138 

162 

22,356 

- 4.25 

- 688.50 

149 

129 

19,221 

6.75 

870.75 

160 

82 

13,120 

17.75 

1,455.50 

171 

35 

5,985 

28.75 

1,006.25 

182 

16 

2,912 

39.75 

636.00 

193 

5 

965 

50.75 

253.75 

204 

3 

612 

61.75 

185.25 

215 

1 

215 

72.75 

72.75 

TYitfil 

629 

89,475 


8,960.75 






89,475 

629 


142.25. 

8 a 14^4 
a 142.25 


_ 8,960.75 
8a 629 

0 . 1001 . 


14.24. 


Table XL.— Average Deviation of Freshman Weights from the 

Median 


fdu 


Total 


;m, in pounds 

Frequency 

d» 

A' 

/ 

105 

15 

-35.55 

116 

43 

-24.55 

127 

138 

-13.55 

138 

162 

- 2.55 

149 

129 

8.45 

160 

82 

19.45 

171 

35 

30.45 

182 

16 

41.45 

193 

5 

52.45 

204 

3 

63.45 

215 

1 

74.45 


629 





533.25 
1,055.65 

1.869.90 
413.10 

1,090.05 

1.594.90 
1,065.75 

663.20 

262.25 
190.35 

74.45 

8,812.85 








DISPERSION 


137 


M = 


140.55 (see p. 116, Chap. IV). 
, 8,812.85 

Om — - 


5,r 

M 


629 

14.01 

140.55 


= 14.01. 


= 0.099. 


8m must be less than 6 a , since, as has been proved, the sum of 
the absolute values of the deviations from the median is a mini¬ 
mum. The median is, for this reason, more logical to use for 
average deviation than one of the other averages. 

10. Standard Deviation.—In order to avoid the difficulty of 
negative deviations, each deviation may be squared. The squares 
of the deviations are then added, the sum divided by the number 
of observations, and the square root of the quotient taken. 
This result is root-mean-square of the deviations. If there are 
N observations, and deviations are taken from some number 
X', the root-mean-square deviation is 


■4 


2 d *' 2 

N ' 


in which d x > is deviation from X'. Root-mean-square deviation 
from the arithme tic mean is known as standard deviation. Its 
mathematicaTproperties make it of great importance in the study 
of statistical distributions and the theory of errors. The stand¬ 
ard deviation is 


[Id? 

V N 


where <r is read sigma or standard deviation. 

11. Relation between s and a .—The following relation always 
holds between $ and <r, 

2 + x' 2 



s 2 = a 


where 


x' 


= a - X' = 


2 dx‘ 

N 


Proof: 


_ s(x - xy 

N 7 


x ' 2 


= (a - xy, 

$2 __ £/2 = 2(X — X') 2 — N( a — XO 2 

AT 

= SX 2 - 2X'2X + X X' 2 - Na 2 -f 2 NaX' - NX' 2 

N - 


138 


A FIRST COURSE IN STATISTICAL METHOD 


= 2X 2 - Na 2 
N 

2 (d a + a) 2 - A r a 2 
N 

_ 2tfa 2 + 2aZd a + A/~q 2 - iVfl 2 

" IV 

_ 2 d a 2 
N 

= <J 2 . 

S 2 = a 2 + x'\ 

Multiplying both sides of 

<r 2 = s 2 - x' 2 


Since Na = 2X. 
Since d a = X — a. 

Since 2 d a = 0. 
Bv definition. 


by N gives 

Na 2 = Ns 2 - Nx' 2 . 

Ns 2 is the sum of the squares of deviations from any arbitrary 
number X'. Na 2 is the sum of the squares of deviations from the 



Fig. 42. 

arithmetic mean. Therefore, from the last equation above, it is 

true that: „„„ - e n 

The sum of the squares of deviations from the arithmetic me 

minimum. So the arithmetic mean is the logical average rom 

l which to measure dispersion when the squares of deviations a 

12. Relation Graphically.—Knowing the values of* , 

the value of s may at once be obtained graphically as 
On the z-axis at the value of the arithmetic mean A U W- * 
erect a perpendicular, AS, equal to , measured to the scale o^ the 
ir-axis. On the z-axis lay z' from A to the point X . Ora 

x ' s ' Since (X'sr = (ASY + ( AX'y, 

(X'sy = <r 2 + 




DISPERSION 


139 


Therefore, 

X'S = 6-. 

Conversely if s and x' are known, <r can at once be determined 
graphically. 

13. Other Names.—Standard deviation, a, is also known as , 
dispersion, index of variability, mean error , root-mean-square- 
error , etc. In this constant it is always understood that devia¬ 
tions are taken from the arithmetic mean. 

14. A Measure of Consistency.—Since the deviations are 
squared, large deviations have more influence in determining a 
than in average deviation. This is, to some extent, offset by tak¬ 
ing the square root of the average square of deviations. In 
measuring physical quantities, the presence of many large devi¬ 
ations from the arithmetic mean would reveal inconsistency in 
the observed measurements. So a is used in measuring consist¬ 
ency of the set of observations. Suppose that the surveyor, 
in taking a number of measurements of the length of a line by a 
yardstick, finds that the standard deviation is larger than when a 
steel tape is used. He at once concludes that the tape measure¬ 
ments are more consistent than the yardstick measurements. 
The same conclusions may be drawn in regard to any variable 
that approximately follows a normal distribution. 

16. The mechanical interpretation of standard deviation is that 
it is the radius of gyration of the area under the frequency curve 
about the ordinate through the center of gravity of that area. 
This can be readily seen by those familar with the meaning of 
radius of gyration. 

The radius of gyration of a rectangular histogram about an 
axis through its center of gravity is the square root of the sum of 
the products of the area of each rectangle multiplied by the square 
of the distance of its center from the axis through the center of 
gravity, divided by the total area. The width of each rectangle 
being equal to one class interval may be regarded as unity. 
The areas of the rectangles are then equal to class frequencies. 
The ordinate at the arithmetic mean passes through the center of 
gravity of the entire area and so may be used as the axis of gyra¬ 
tion. The distances of centers of rectangles from the axis are 
deviations of the midpoints of classes from the arithmetic mean. 

hen the product of the area of a rectangle by the square of the 
distance from the axis is the product of frequency by square of 
deviation from arithmetic mean. The sum of these products is 




140 A FIRST COURSE IN STATISTICAL METHOD 

the sum of the squares of deviations from the arithmetic mean for 
every item in the set. The entire area is total frequency. So the 
quotient of the sum of the products of the area of each rectangle 
times the square of its distance from the axis, divided by the total 
area, equals the average square deviation from the arithmetic 
mean. Or, in other words, the square of the radius of gyration 
equals <r 2 . Now let the width of rectangles decrease indefinitely, 
their number increasing without limit, so that the rectangular 
histogram approaches the frequency curve as a limit. Then the 
radius of gyration of the rectangular histogram approaches 
the radius of gyration of the area under the frequency curve 
as a limit, and the standard deviation of the sample shown in 
the rectangular histogram approaches as a limit the standard 
deviation of the universe from which samples were taken if they 
are truly representative. 

16. Me anin g of Radius of Gyration.—The meaning of radius of 
gyration is that if a mass is rotating about an axis, the radius of 
gyration is such a distance that if the entire mass were concen¬ 
trated at that distance from the axis the rotational effect would 
be the same as for the actual distributed mass. In other words, 
it may be called an average rotational radius for all the particles 
of the rotating mass. In the same way standard deviation is an 
average deviation, or a measure of dispersion. 

17. Relation between s and a. —It is a well-known theorem in 
mechanics that the square of the radius of gyration of an area 
about an axis not through the center of gravity equals the square 
of the radius of gyration about a parallel axis through the center 
of gravity plus the square of the distance between the 
axes. From this follows at once the relationship: 

$ 2 = <r 2 + x'\ 

There are a number of interesting analogies between the 

theorems of mechanics and those of statistical method. > 

18. Examples.—The method of computing standard deviation 

from such sets of values of variables as those given in a e 
XXXVII is obvious. A column of d a 2 would be added to the a e. 
This column would be totaled, giving Zd a 2 . This total wouId be 
divided by the number of items, N, and the square roo o 
quotient found. For example, for group I, 

2 d a 2 = 4, N = 7. 
a — y/% = 0.76. 



DISPERSION 


141 


The average deviation, 

5 — 0.57. 

For groups II and IV, 

2 d« 2 = 292. 

a = V 29 M = 6.5. 
The average deviation, 

5 = 5.7. 

For group III, 

2 d a 2 = 242. 

a = V 24 K = 5.9. 
The average deviation, 

6 = 4.86. 


As would be expected, in each case 

<r > 5. 

19. Coefficient of Dispersion.—As in average deviation, here 
it is not fair to compare c of either group I or II with c of either 
group III or IV, on account of the great difference in averages. 
So here, as was done before, coefficients of variation should be 
used for comparison if the averages are not about equal. The 
coefficient of dispersion, or coefficient of variation, equals c/a. 
For the different groups of Table XXXVII, the coefficients of 
variation are 


Group 

<J 

a 

I 

0.01 

II 

0.10 

III 

0.009 

IV 

0.01 


The coefficient is sometimes multiplied by 100 and is thus 
stated as a per cent. 

The relative size of standard deviation and the average from 
which it is derived is important. If the standard deviation of the 
heights of the men in the military company of Sec. 2 of this 
chapter is 1 in. and that of the lengths of their noses is also 1 in., 
the spread, or scatter, of heights is the same as that of nose length! 
But if the average height of the men is 69 in., the standard devia- 
tion is insignificantly small. If the average nose length is 1.75 



142 


A FIRST COURSE IN STATISTICAL METHOD 


in., a standard deviation of 1 in. is very significantly large. The 
standard deviation of weights of a random sample of elephants 
may be more than the standard deviation of weights of a random 
sample of cats and yet be insignificantly small as compared with 
that of cats. So, for purposes of comparison, relative values 
must be used. 

20. Computation from Frequency Table, Direct Method.— 
The method of working from a frequency table may be shown by 
use of Table XIV of freshmen heights. The table for computing 
a by direct method of deviations from the arithmetic mean is 
shown as Table XLI. 


Table XLI.—Standard Deviation of Heights, Direct Method 


/ 


fX 


da’ 


fda * 


60.5 
62.0 

63.5 
65.0 

66.5 
68.0 

69.5 
71.0 

72.5 
74.0 

75.5 


Totals 


3 

S 

33 

51 

115 

156 

148 

64 

43 

5 

3 


629 


181.5 
496.0 

2,095 5 
3,315.0 

7.647.5 
10,608.0 
10,286.0 

4,544.0 

3.117.5 
370.0 

226.5 


42,887.5 


7.68 
6.18 

4.68 
3.18 

1.68 
0.18 

1.32 
2.82 

4.32 
5.82 

7.32 


58.9824 
38.1924 
21.9024 
10.1124 
2.8224 
0.0324 
1.7424 
7.9524 
18.6624 
33.8724 
53.5824 


176.95 
305.54 
722.78 
515.73 
324.58 

5.05 

257.88 

508.95 
802.48 
169.36 
160.75 


3,950.05 


42,887.5 _ 
a 629 


= 68.18. o 


<3,950.05 = 2799 = 2.51. 

\ 629 


a 

a 


2.51 


68.18 


= 0.037. 


The first column, size of item X. shows midpoints of Masses. 

The second column is frequency /. The third column is/ • 
total of this column divided by N gives a, the arithmetic mean. 

42,887.5 


a = 


629 


= 68.18. 


The fourth column gives d a , deviations from the anthmet* 
mean. The fifth column gives the squares of the devia io 





DISPERSION 


143 


this column were added, the sum would give only one d a 2 for each 
class. The sum of d 0 2 must be obtained for all N items; so the 
sixth column of fd a 2 is formed. The total of this column is 
2 d 0 2 for the entire table. Then 


a 


a 


I2dj 

\TT' 

/3,950.05 

V 029 7 


a 


2.51. 


The coefficient of dispersion is 

c _ 2.51 
a 68.18 
= 0.037. 


Table XLII. —Short-cut Method 


X 

/ 

dx' 

fdx- 

fd*x> 

60.5 

3 

| 

-7.5 

- 22.5 

168.75 

62.0 

8 


- 48.0 

288.00 

63.5 

33 1 

-4.5 

-148.5 

668.25 

65.0 

51 

-3.0 

-153.0 

459.00 

66.5 

115 

-1.5 

-172.5 

258.75 

68.0 

156 

0 

0 

0 

69.5 

148 

1.5 

222.0 

333.00 

71.0 

64 

3.0 

192.0 

576.00 

72.5 

43 

4.5 

193.5 

870.75 

74.0 

5 

6.0 

30.0 

180.00 

75.5 

3 

7.5 

22.5 

168.75 

Totals. 

629 


+ 115.5 

o n*7 1 or 



0,971.25 


X' = 68.0. 



+ 115.5 
629 


0.18. 


a = 68.0 + 0.18 = 68.18. 

13,971.25 ~~ 

° \ 629 ^°* 18 ) 

= y/ 6.3136 - 0.0324 

= V0812 

= 2.51. 









144 


A FIRST COURSE IN STATISTICAL METHOD 


21. Short-cut. —Table XLII shows the method of computation 
by use of deviations from some arbitrary number X'. This is 
the short-cut method. X ' should be chosen as one of the mid¬ 
points of classes; so the deviations for that class will be 0. Take 
that class nearest the middle of the table or nearest the estimated 
arithmetic mean or the class of greatest frequency. In this table 
the sixth class satisfies all these conditions. 


X' = 68.0. 


The first and second columns are the same as in Table XLI. 
The third column, d x >, is deviation from X'. The deviation for 
X = 68.0 is 0. The deviations above and below this are multi¬ 
ples of the class interval, positive for X > 68.0 and negative for 
X < 68.0. Each deviation is multiplied by the corresponding 
frequency to give the fourth column. The total of this column, 
2d X ‘, divided by N gives x' f the deviation of the arithmetic mean 
a, from X', the arbitrary origin of measurements. The following 
equation has previously been stated: 


a = X' + 



or 


2 d : 
N 


= a - X'. 


x! was defined on page 137 as a X , 

2d x > 


x' = 


N 


Here 


x' = = 0.18- 

x 629 

a = X' + x' 

= 68.00 + 0.18 
= 68.18. 

The entries in the fifth column are formed by taking the 
product of corresponding entries in the third and fourhcolwnns, 
giving /dV- The total of this column divided by N is s • 1 

page 137, _ 

IZd 2 x' 

s = VT 


3,971.25 

629 


Here, 


DISPERSION 


145 


From 

$ 2 = <r 2 + X' 2 

<T 2 = S 2 - X /2 - 

From this table, then, 



= a/6.3136 - 0.0324 
= y/ 6.2812 
= 2.51. 


The coefficient of dispersion is 

a = 2.51 
a 68.18 
= 0.037. 

In general, 



Whichever form is preferred may be used. 

22. If X' = 0. —It may be of interest to see what form the 
formula for a takes when X' = 0. Deviations of the items 
from 0 are the values of the items themselves. 

So 




146 


A FIRST COURSE IN STATISTICAL METHOD 


Table XLIII.— Standard Deviation of Freshman Heights from X' = 



60.5 
62.0 

63.5 
65.0 

66.5 
68.0 

69.5 
71.0 

72.5 
74.0 

75.5 


Totals 


3 

8 

33 

51 

115 

156 

148 

64 

43 

5 

3 

629 


181.5 
496.0 

2 , 095.5 

3 , 315.0 

7 . 647.5 
10 , 608.0 
10 , 286.0 

4 , 544.0 

3 . 117.5 
370.0 

226.5 

42 , 887.5 


10 . 980.75 
30 , 752.00 

133 , 064.25 
215 , 475.00 
508 , 558.75 
721 , 344.00 
714 , 877.00 
322 , 624.00 
226 , 018.75 
27 , 380.00 

17 . 100.75 

2 , 928 , 175.25 


42,887.5 


= 68.18 


629 __ 

= 12,928,175.25 _ ( G8 lg ) 2 

V 629 

= V4,655.28 - 4,648.51 


= 2.51. 

Table XLIII shows the computation for freshman heights. 
The formula is very simple, but the work of calculation is increased 

over that of Table XLII. 

The formula _ 

IXX 2 - Na 2 

* = V N 


may be written 


< 7 2 = 


2X 2 


— a 2 . 


This translated into English reads: The square of standard 
deviation equals the averse square of the items diminished by 

^dlt^Tpproxhnately normal, it has been found 
empirically that 


* = 0.8 
a 5 


nearly. 






DISPERSION 


147 


If the distribution is that of the normal probability curve, the 
ratio 5/<7 can be proved to be 0.7979. 1 

23. Labor of Computation.—The labor of computation of 
average deviation is a little less than that for standard deviation. 
If computing machines or product tables and tables of squares are 
available, the amount of labor is about the same for each. Where 
the measure of dispersion is to be used in further mathematical 
work, the standard deviation lends itself to algebraic processes 
much more readily than does average deviation. 

24. a is an Average.—Standard deviation is an average in the 
sense of the general definition of average. If 

d\ — di = c?3 = . . . = d N — k t 

then 

IId * [Nk* , 

V7T = V~N = k 

equals each of the d’s. 

26. cr = y/rvpq .—In Chap. II, page 31, Theorem III, it was 
stated that if p is the probability of success of a given event in a 
single trial and q is the probability of its failure, then the probabil¬ 
ity of r successes and n — r failures in n trials is 

nC r p r q n ~ r , 

the (n — r 4- l)th term in the binomial expansion of 

(:V + q ) n . 

Always 

V + 3 = 1. 

The theoretical number of times, then, that one would expect 
to get r tails and n — r heads in tossing n coins N times is 
N n C r p r q n - r . In other words, the most probable frequencies of 
occurrence of the different possible combinations of heads and 
tails in tossing n coins N times are equal to the terms of the 
binomial expansion of 

N(j) + g)» 

where p is probability of throwing tails in a single toss of one coin 
and q is the probability of throwing heads. The various probabili¬ 
ties for seven dimes were worked out on page 31, Chap. II, 
and the actual frequencies in an experiment of tossing the seven 
dimes 500 times were compared with the theoretical frequencies 
in Table XVII. On page 78 

a = \/ npq 

was used as a constant. 

1 Jones, "First Course in Statistics,” p. 246 . 


148 


A FIRST COURSE IN STATISTICAL METHOD 


It is now desired to show that this a is standard deviation of 
the entire distribution of most probable frequencies as given by 
the terms of 

N(p + q) n . 

Since N is a common factor of every term, the terms of the 
expansion of 

(p + ?)* 

are the fractions of N giving the corresponding frequencies. 
Or, if N is regarded as unity, these fractions themselves may be 
called the frequencies. Now form a frequency table with columns 
for getting arithmetic mean and standard deviation, as in Table 
XLIV. The origin of measurements for deviations X' is taken 
at 0, as was done in Table XLIII. 

The entries in / column are the terms of the expansion of 

(P + ?)"• 

Therefore, 

N = 2/ - 1. 

Since 

(p + Q) = !> 

(p + q) n = 1 . 


In the fX column, nq is a factor of every term. 

XfX = nq {p- 1 + (n - l)p"- 2 g + ~ ^ ~ 2) P n ~ 3 g 2 

+ (” ~ ~ 2 3 )(n —+ . • • +3"' 1 } 

= nq(p + q) n ~ l 
= nq. 


since 


(p 4- q) n 1 = 1 

In the fX 2 column, nq is a common factor of every term, 


XfX 2 = nq p n ~ l + 2(* - l)P n_2 ? 4- 3- ^2 


(n - 1) (n - 2) 


3„2 


v n <l 


4(n ~ l)(n ~ 2)(n - 3) 3 + . 4 - w? "- 1 } 

+ ~ 1-2-3 F * 1 

= nq [p"' 1 4 - (n - l)p n ~ 2 q 4- — 

(n - 1 )(n ~ 2)(n -_3 ) ,-y + . . . + 


1-2-3 


+ [(» - i)p- 2 ? + 2 ( — 1 tJ—- 



Table XLIV. —Arithmetic Mean and Standard Deviation for Distributions Proportional to the Terms of (p + q ) 


DISPERSION 


149 





150 


A FIRST COURSE IN STATISTICAL METHOD 



1 )(n - 2 )(n - 3 ) 
1 - 2-3 


p n ~ 4 g 3 + . . 


+ (n - l)g n_1 ]J 


= nq 


(V + ?) n_1 + (n — 1 )q[p n ~ 2 -f (n — 2 )p n ~ 3 q 

+ (U ~ ~ 3) P n_ 4 9 2 + • • . + 9 -’] 


= ng{(p + g )*- 1 + (n - 1 )q(p + 9 ) n ~ 2 ) 
= nq {1 + (n - 1)5} 

= nq + n 2 q 2 — nq 2 . 


a 



With 



= 0 (see p. 145), 
x' = a, 




nq + n 2 q 2 — nq 2 

1 

= nq — nq 2 
= nq( 1 - g) 


n 2 g 2 


= npg. 

So 

a = y/ npq. 

If p and q are the probabilities of throwing tails or heads, 
respectively, of a single coin in a single trial, then 

P = 1 = H- 

If p is the probability of throwing a 5 in a single throw of a 
single die, then 

V = H- 

The probability of failure to get a 5 is 

q = to¬ 
other values of p and q apply to other events. The above value 
of standard deviation applies to distributions which follow the 
terms of the binomial expansion of 

(p + $)", 


no matter what values p and q may have. . . , . 

26. Quartilesand Other Division Points.— Sometimes it is desir- 

able to regard the distribution of the middle portion of the array 
only This, of course, leaves the extreme items entirely out oi 


DISPERSION 


151 


consideration. This would be the case if there is reason for 
believing that the extreme items are merely freaks that should 
not influence the conclusions in regard to distribution. More¬ 
over, the statistical constant called probable error, discussed later, 
is concerned with the range of the middle half of the array. So 
it becomes at once a matter of importance to find the two divi¬ 
sion points of the array known as quartiles. Definitions and 
formulas for finding these and other division points in general, 
including the median, will now be given. 

27. Placing in Array.—Given N items. Place them in array 
on a horizontal base line, letting each item occupy unit space on 
this line. The length of this line is N. In arraying the items, 
place the smallest item at the left-hand end of the base line, 
occupying the segment from 0 to 1, the next larger item occupying 
the segment from 1 to 2, etc. The largest item will occupy the 
iVth unit segment of the base line. Now draw a curve through 
the tops of the items as arrayed. 

28. The median is the value of the ordinate to this curve half 
way through the array. This is the ordinate at the midpoint of 
the base line. It stands at the horizontal scale mark, N/2. 
It is evident, if N is even, that exactly half the items are each 
less than the median and half the items are each greater than the 
median. If AT is odd, the median ordinate will stand at the mid¬ 
dle of the center unit segment of the base line. The item stand¬ 
ing on this center segment may be regarded as belonging half to 
those items less than the median and half to those items greater 
than the median. It is true that exactly half the items lie to 
the left of the median and half to the right, the middle item being 
divided into halves by the median. It would, then, be correct 
to say that half the items are less than the median and half are 
larger than the median. In either case it is strictly true that 

there are as many items less than the median as there are greater 
than the median. 


29. The first quartile is the value of the ordinate one-fourth 
of the way through the array from the zero end of the base line, 
t is the ordinate at one-fourth of the length of the base line from 
its zero end or three-fourths of the length of the base line from the 
0 er end. It stands at the horizontal scale mark N/4. It is 
evident if N is a multiple of 4, of the form N = 4 n, that exactly 

r'°" h , 0f the lte “ s are each less than the first quartile and 
three-fourths are each greater than the first quartile. If N is of 


152 


A FIRST COURSE IN STATISTICAL METHOD 


the form 4n -f- 1, 4n + 2, or 4/i -f 3, then N/ 4 equals n -+- J4> 
n -f %, or 7i + %, respectively. The ordinate measuring the 
first quartile is then at 34> %, or % of a unit from the beginning of 
the unit segment upon which it stands. If each item is regarded 
as occupying the entire unit segment upon which it stands, then 
34, %, or %, respectively, of the item on that unit segment is to 
the left of the quartile and %, %, or J4 of it to the right of the 


quartile. It is true, then, that exactly 34 °f the N items are to 
the left of the first quartile and % of them to the right. If one 
may regard 3i> %, or %, respectively, of the item standing on 
this segment as belonging with those items less than the first 
quartile, and %, %, or 34> respectively, of this item as belonging 
with those items greater than the first quartile, then it would be 
correct to say that 25 per cent of the items are each less than the 

first quartile and 75 per cent are each greater than the first quartile. 

To illustrate: If N = 24, N /4 = 6, and the ordinate measuring 
the first quartile is at six units from the beginning of the base line; 
6 items, 25 per cent of the total are each less than the first quartile 
and 18 items, 75 per cent of the total, are each greater than the 
first quartile. If IV = 25, N /4 = 6.25. The ordinate measuring 
the first quartile is at 6.25 units from the zero end of the base line. 
Thus 634 items are regarded as each less than the first quartile, 
and 18% items as each greater than the first quartile. Again 
25 per cent of the items are each less than the first quartile 


and 75 per cent of them each greater. # . . 

From the above definition and illustration, the definition ol 

the first quartile may be justified as being such a quantity that 25 
per cent of the items are each less than the first quartile and to 
per cent are each larger. The first quartile need not be e< l ua 
one of the items in the array. If it happens to equal one of tn 
items, that item is divided and regarded as belonging halftotnose 

items less than the first quartile and half to those lt ® m ® ^ „ 
30. The third quartile is the value of the ordinate to the curv 

three-fourths of the way through the array from the zero en 
the base line. It is the ordinate at three-fourths of the lengt^ 
of the base line from the zero end and one-fourth of the 1 gtb 
the base line from the other end. It stands at the bon onta 

scale mark, 3AT/4. If N is a multiple of 4 it » 
exactly three-fourths of the items are each less thant “, e 
quartile and one-fourth are each greater. If Nu not a ion 
of 4, assumptions are made similar to those used m the discuss 


DISPERSION 


153 


of the first quartile. Under these assumptions the third quartile 
may be defined as being such a quantity that 75 per cent of the 
items are each less than the third quartile and 25 per cent arc 
each larger. 

31. Other Division Points.—Sometimes octiles, deciles, or 
percentiles are used. The first octile is the value of the ordinate to 
the curve at one-eighth of the way through the array, the second 
octile , the value of the ordinate two-eighths of the way through the 
array, and so on for the other octiles. The first decile is the value 
of the ordinate to the curve at one-tenth of the way through the 
array, the second decile two-tenths, and so on for each decile. 
Each of the above division points may be expressed as a percentile , 
thus making percentile a more general term for a division point. 
The pth percentile is the ordinate to the curve at p per cent of the 
way through the array. 

32. Notation.—The following notation will be used: 

M = median. 


Q i = first quartile. 

Qz = third quartile. 

Oj = first octile. 

0 3 = third octile. 

D 4 = fourth decile. 

P p = pth percentile. 

Pc,o = Ds = 0 A = Qi = M. 

Similar equalities in different parts of the array are readily 
seen to hold. 


33. Significance of Division Points.—As was shown in the chap¬ 
ter on Averages, M is that size such that the sum of the absolute 
values of the deviations of all items from it is less than the sum 
of the absolute values of the deviations from any other size. 
Qi and Q 3 are such ordinates that exactly the middle half of 
the items of the array lie between them. Q 3 - Q 1 is called the 
interquartile range. It is an important quantity. Dividing the 
array by means of any two octiles or two deciles segregates a 
definite part of the array which one may wish to study. The 
percentiles divide the array into 100 parts each containing 1 
per cent of the total number of items in the array. By means of 
proper selection of percentiles, the array can be divided in almost 
any manner for purposes of study and comparison. If population 
of a locality is arrayed according to wealth, that portion of 
population between P 60 and P n is sometimes called the middle 


154 


A FIRST COURSE IN STATISTICAL METHOD 


class. It excludes the poorer half and the richest 5 per cent of 
the population. Not infrequently this middle class has about 
half of the wealth. 

34. Determination of Median, Quartiles, and Other Division 
Points.—If there is a relatively small number N of items in a 
sample, it is easy to determine the division points desired by 
placing the items in array and counting up to the point in 
question. 

If N is relatively large, a frequency table may either be given 
or be formed from the original tabulation of measurements. 
Suppose an array of items is given as shown by a frequency table 
consisting of the first two columns of Table XLV. Although N 
(= 24) is here relatively small, the table serves the purpose of 
illustration. Form the cumulative columns F and F'. 


Table XLV.—Determination of Division Points 



On a horizontal line (baseline) (Fig. 43) plot the class boundaries 

vertically. Plot the lowest boundary, 2.5, at 0. Plot the nex 
boundary, 7.5, at 2, the frequency of the first class. Plot all the 
class boundaries so that consecutive boundaries are at distances 
apart equal to the corresponding class frequencies. ^ This p aces 
them at values of the cumulative frequencies F. The ordina e 
representing the class boundaries of the table are drawn in o e 
lines. Connect the top of each dotted line to the top o t e nex 
by a straight line. These straight lines form a broken line, 

ordinates to which measure the size of items. Let eac . 
occupy a unit segment of the base line, the smallest item standing 

on the segment 0 to 1, the next larger on 1 to 2, and so on 



DISPERSION 


155 


largest, which stands on the segment 23 to 24. Not knowing the 
exact distribution of sizes in any class, assume uniform variation 
throughout each class. The size of any item is then measured by 
the ordinate to the broken line at the midpoint of the unit seg¬ 
ment of the base line on which that item stands. Thus the size 
of the fifth item of the array (third in the second class) is, under 
the assumption of uniform variation, equal to 10, the ordinate 
to the broken line at the midpoint of the base line segment 
4 to 5. Ordinates at the proper places measure the required 
division points. 

35. Derivations of Formulas.—Now determine a formula for 
finding any division point. This includes the formula for median 
which was previously derived in Chap. IV. The process is 
repeated here for the sake of generality for all division points. 



N — total number of items, 
c = class interval. 

f = frequency of class in which the required point lies. 

I = lower boundary of class in which the required point 
lies. 

V = upper boundary of class in which the required point 
lies. 

F P = cumulative frequency up to the class in which the 

required point lies. For the median, P = M, for 

first quartile, P = Q lt and similarly for other division 
points. 




156 


A FIRST COURSE IN STATISTICAL METHOD 


F' P = cumulative frequency down to the class in which the 
required point lies. 

i = number of units of base line that must be passed from 
the lower boundary of the class in which the required 
point lies to reach the required point. 
i' = number of units that must be passed from the upper 
boundary to the required point. 
x = amount that must be added to l to give the value sought. 
x' = amount that must be subtracted from V to give the 
value sought. 

Suppose it is desired to find the median M. In the given table 
N = 24, c = 5. N/2 = 12, the base-line scale mark at which 
the median stands. At 12 erect the ordinate to B u . This 
ordinate measures M. Examining the F column of the table, 
it is found that the twelfth item is in the third class. So / = 9, 
l = 12.5, F u = 7, i = N/2 - F u = 12 - 7 = 5. As shown in 
Fig. 43, x = C „B u , the amount that must be added to l = 7-A M 
the lower boundary of the class in which M lies, to give the 
value of M. c = E J>f = A U E i = A.C, From the 
similar triangles A U B U C u and A U D U E u 

CmBm_ AmCm 

E&iDm A m E m 

Substituting the above values and solving for x , 

x i 


ci 



Therefore, 

M = l + x = l+y 

Using the values of the given example, 

M = 12.5 + 

M = 12.5 + 2.8, 


M = 15.3. _ 

Working from the upper values down: N = 24, - , 

From theV column of the table it is seen that the twelto 

item from the upper end of the distribution is m the ou 

So f = 9, V = 17.5, F’ m = 8, 




DISPERSION 


157 


In the figure, x ' = F U D u , the amount to be subtracted from 
V = 16-D „ to give M = 12 -B u . 

From triangles B J) J? u and A U D U E u 

F mDm _ ByF u J 

EajDai AmEx 

Whence, 


r' - d> 

* ~r 

Then, 

M = V - x' = V - j- 

In the given example: 


M = 17.5 - 

y 

M = 17.5 - 2.2, 
M = 15.3. 


From the manner of derivation of the formulas, the two results 
are necessarily the same. 

For the third quartile, Q 3 , the required ordinate is erected at 
the base-line scale mark 3AT/4. With the given Table XLV and 

Fig. 43 this scale mark is —^— = 18. By examination of the 
triangles A Qi B Ql C Ql and A 0l D Ql E Qv it is easily seen that 

Q 3 = l + x = i + 

Or from triangles B Ql D Qi F 0l and A Qt D Qt E Ql 

Qz = V - s' = V - j • 

In the given table the eighteenth item falls in the fourth class. 
Then, 


t-4,l= 17.5, Fa, = 16, i = ^ 




Qz = l + ~ = 17.5 + i*?, 
J 4 

Qz = 17.5 + 2.5, 

Qz = 20.0. 


Or, from the upper end N/i (= 6) itemsmust be passed toreach 
Q, From column F' the sixth item from the largest is in the 


158 


A FIRST COURSE IN STATISTICAL METHOD 


third class from the bottom of the table. Then / 
22.5, F' Qi = 4, t' = JV/4 - F' Qj = 6-4 = 2, 

Q 


= 4, 




/ 

<? 3 = 22.5 - 2.5, 
= 20 . 0 , 



the same as from the other direction. 

36. Formulas the Same for All Division Points.—The formulas 
derived for Q 3 are of exactly the same form as those for M. It 
is evident from the geometry of the figure that the same formulas 

hold for any division point whatsoever. 

Let the student determine from both directions the value of 

Qi and of P 3 o- 

37. Certain Division Points for Freshman Heights—Suppose 
it be required to find the values of M, Q i, Qz, D lt Do, 0 3l and P 8 5 
for freshman heights as given in Table XIV. The work may be 
arranged somewhat as in Table XLVI. 


Table XLVI—Computation of Division Points for Freshman Heights 


Division 

point 


Horizontal 
scale mark 


Computation 


M 

Qi 

Qi 

Dx 
D 9 
Oi 
Pn 


N 

2 

N 

4 

3.V 

4 

N 

10 

9JV 

10 

3V 

8 


3i 4 

i57 l 

471 ! 

62.9 


566.1 

235I 


0.85.V = 534.63 


67.25 

65.75 

68.75 

64.25 

70.25 

67.25 

70.25 


104.50 

62.25 

103.75 

18.9 

52.1 

25.875 

20.65 


156 

115 

148 

51 

64 

156 

64 


67.25 08.25 

05.75 + 1-5 itf -— - 66 36 


68.75 + 


l- 5 X l°5 75 D 69 82 


148 


04.25 + , ^ 1 18 ^ - «•« 


70.25 + 


67.25 + 


70.25 + 


1.5 X 52._1 _ 71.47 


64 


1.5 X 25-875 _ 67 50 


156 


1-5 X 20. 65 = 70 _ 73 


64 


N = 629 


38. Quartile Measure of Dispersion.—In the middle *|“ lf °f ““ 
array, that is, the portion between the firs and th.rd quar de • 
the distribution is quite likely to be fairly uniform. 
especially true in an approximately normal distribu , 
classes of greatest frequency occur near the middle the table_ 
The problem of obtaining a measure of dispersion for PP 

distributions .boo 

only the middle half of the array is used. As the va 



DISPERSION 


159 


fairly uniform between the first and third quartiles, the size of 
the median will be very nearly equal to the arithmetic mean of 
the two quartiles. 

M = Ql + Q '■ 

The absolute values of the deviations of the two quartiles 
from the median will be nearly equal and will serve as a measure of 
dispersion. The quartile measure of dispersion, then, equals one- 
half of the interquartile range. It equals 

M — Qi = Q, - M = 

39. Quartile Coefficient of Dispersion.—The corresponding 
coefficient of dispersion would be the measure of dispersion 
divided by that average from which the deviation was taken. 
Quartile coefficient of dispersion equals 


_ Q3 — Qi . Q* + Q\ 

2 ‘2 
_ Qa ~ Q i 
Q3 + Qi 

The quartile measure of dispersion is very easily computed for 
distributions which can be put in array. In this respect it has an 
advantage over average deviation or standard deviation. In 
case of normal distribution, quartile deviation is about 0.67 of 
standard deviation. Quartile deviation is related to probable 
error to be considered later. 

40. Quartile Deviation a Rough Measure—Since one-fourth of 
the items are each less than the first quartile, one-fourth between 
the first quartile and the median, one-fourth between the median 
and the third quartile, and one-fourth larger than the third 
quartile, these three division points divide the entire set into four 
groups with the same number of items in each. The first quartile 
is the median of that half of the array smaller than the median of 
the whole array. The third quartile is the median of the other 
half of the array. Each is thus a type form for half of the array 
and thus gives a rough idea of the distribution on each side of the 
median. So quartile deviation from the median gives a rou°h 
measure of dispersion of the entire group. 



160 


A FIRST COURSE IN STATISTICAL METHOD 


41. Frequency Classes Instead of Size Classes.—Instead of 
dividing the entire array of N items into four groups of N/4 
each, a slightly more minute division may be made into eight 


groups of N/8 items each, or ten groups of N /10 items each. 
To obtain a still more minute analysis, the entire set is divided into 


100 groups of A^/100 items each. The division points are odiles, 


deciles , and percentiles, respectively. The result is that the N 
items are classified into groups of equal numbers (frequencies) 


each, instead of equal variations in size. For example, plotting 
the values of the deciles of freshman heights computed from 



Decile Number 

’Fig. 44.—Deciles of freshman heights. 


Table XIV, Fig. 44 is obtained. The plotted values are connected 
by straight lines. 



64.81 

66.22 

66.97 

67.65 

68.31 

68.86 

69.50 

70.14 

71.47 


Deciles of Freshman Heights 














DISPERSION 


101 


This graph shows that the variation from decile to decile is 
very uniform except at the two extremes of the table. 

42. Division Points Graphically—Any or all of the various 
division points may be obtained graphically from the ogive. 
For the pth percentile, lay off from the base line p/lOOths of 
the longest ordinate to the curve. At this point draw a parallel to 
the base line. Where this parallel meets the curve, drop a per¬ 
pendicular to the base line. At the point where it meets the 
base line, read the value of the pth percentile on the base line 
scale. Since any ordinate to the ogive measures the cumulative 
frequency up to the point where that ordinate stands, the ordi¬ 
nate as above constructed locates a division point such that p 
per cent of the items are less than the value at that point and all 
the rest are greater. The ogive is thus the most convenient 
means of getting a set of division points graphically. 

43. Comparison of Measures of Dispersion.—Three distinct 
measures of dispersion have been studied. The question natu¬ 
rally arises as to which one to use in any given case. If the dis¬ 
tribution is nearly symmetrical or only a rough estimate is needed, 
the quartile deviation does very well. If, however, the distribu¬ 
tion is quite unsymmetrical, or if there are violent fluctuations in 
frequency from class to class, the quartile deviation is not very 
representative. Then average deviation would be better because 
it takes into account all deviations from the average used. 
Standard deviation also takes all deviations into account and is 
highly susceptible to algebraic manipulation. Standard devia¬ 
tion is derived from the arithmetic mean and would not be appro¬ 
priate if some other average is used. The arithmetic mean is 
the most commonly used average and its associate, standard 
deviation, the most commonly used measure of dispersion. 
Especially is this true where mathematical processes are involved. 
Standard deviation gives more weight to extreme items than do 
the other measures of dispersion. Sometimes in statistics of busi¬ 
ness or of economics, it is important to bring out the influence of 
these extreme items. If the chief interest lies in that which is 
characteristic rather than in the exceptional case, the average 
deviation is likely to be more useful. If it is desired to shut out 
the exceptional cases entirely, then the quartile deviation may 
be used. In order to form a complete judgment of the situation, 
it is well to compute all three measures of dispersion. These 

compared with the average and with each other, give a very com¬ 
prehensive view. 



162 


A FIRST COURSE IN STATISTICAL METHOD 


In comparing dispersions of two or more variables having differ¬ 
ent averages, do not forget that the comparison should be made in 
terms of coefficient of dispersion instead of measure of dispersion. 

44. Lorenz Curve.—One graphical means of showing dispersion 
is the Lorenz curve It is usually applied to distribution of wealth, 
wages, income, and similar distributions, though it may be 
applied to any frequency distribution. The method is best shown 
by an illustrative example from Table XLVII. 


Table XLVII.— Distribution of Wealth in City A 


Classes, in thousands 
of dollars 

Number of persons, in 
thousands 

Amount of wealth, in 
millions of dollars 

0- 4 

120 

300 

5- 9 

50 

375 

10- 24 

22 

385 

25- 49 

5 

187.5 

50- 99 

2 

150 

100-149 

1 

125 

Total. 

200 

1,522.5 


Table XLVIII.— Distribution of Wealth in City A 

Cumulative 


Classes, in thousands of 
dollars 


Cumulative number of 
persons, in thousands 


0- 4 
5- 9 
10- 24 
25- 49 
50- 99 
100-149 


120 

170 

192 

197 

199 

200 


Cumulative amount of 
wealth, in millions of 
dollars 


300 

675 

1,060 

1.247.5 

1.397.5 

1.522.5 


From Table XLVII a cumulative table, Table XLWIL^ 

number of persons and amount of wealth is construe • 

" is made by reducing the cumulative columns of Table 

XLVIII to percentages. 




DISPERSION 


163 


Table XLIX. —Cumulative Percentages of Number of People and 
Amount of Wealth Possessed by Them in City A 


Cumulative per cent, number of 

persons 

Cumulative per cent, amount of 

wealth 

60 

20 

85 

44 

96 

70 

98.5 

82 

99.5 

92 

100 

100 



Table XLIX shows the percentage of wealth belonging to differ¬ 
ent percentages of the people and the percentage of population 
having various percentages of wealth. The data of Table XLIX 
are plotted and the points connected by straight lines or smoothed 
by a curve. Figure 45 shows a smooth curve through the points 
plotted. The curve starts with 0 per cent of population having 0 








164 


A FIRST COURSE IN STATISTICAL METHOD 


per cent of wealth. It is a long jump to the next point, 60 per 
cent of the population with 20 per cent of the wealth. In draw¬ 
ing this part of the curve, it would be necessary to judge its posi¬ 
tion by the form and direction of the right-hand part of the curve 
or else obtain data giving smaller class intervals for the percent¬ 
ages of the poorer portion of the people. The curve as drawn is 
probably nearer the truth than straight lines connecting the 
points would be. 

46. Interpretation of Lorenz Curve.—Assuming this continu¬ 
ous curve to be true throughout its length, the per cent of wealth 
owned by any given per cent of the population, or the per cent of 
people owning any specified portion of the wealth, may be read 
from the graph. For instance, the poor half of the population 
has about 14 per cent of the wealth, as shown by the ordinate at 
50 on the horizontal scale. The richest 5 per cent of the popula¬ 
tion has about 33 per cent of the wealth. This is determined by 
reading the ordinate at 95 per cent on the horizontal scale and 
subtracting the result from 100. That is, at 5 per cent less than 
100 per cent of population, the curve is 33 per cent below 100 per 
cent of the wealth. The so-called middle class has what per cent 
of the wealth? This may be determined by computation from the 
figures read in the two cases above or may be taken directly 
from the graph. On the graph, take off with a divider the differ¬ 
ence of ordinates at 50 and 95 per cent of the population. Trans¬ 
fer this difference to the per cent of wealth scale and read the 


answer, 53 per cent. 

Half of the wealth is in possession of what per cent of the richer 
people? At 50 per cent of wealth on the curve, read the abscissa 
on per cent of population scale. It reads about 88.5 per cent, 
11.5 per cent less than 100 per cent. Then half of the wealth is m 
the possession of 11.5 per cent of the people. And so with any 
section of the wealth or of population, the percentages may be 

read from this graph. . . 

The dotted diagonal is the line of equal distribution showing 

what the graph would be if wealth were equally distributed 

throughout the entire population. The farther the actua grap 

departs from this line the more unequal is the distribution, 

greater the dispersion. j-ff or pnt 

By plotting, in the same figure. Lorenz curves for difleren 

localities, the dispersion may be compared. Or, by piotu 
the same locality Lorenz curves for different dates, i Y 



DISPERSION 


1G5 


once be seen whether distribution is becoming more or less uni¬ 
form. The Lorenz curve gives a very striking comparison and 
is perhaps more easily interpreted by the layman than is either 
the ordinary frequency curve or the ogive. 


Exercises 


1. Using the data of Table XIV, find the average deviation of freshman 
heights from the arithmetic mean, making a table the form of Table XXXIX. 

2. Find the average deviation of freshman heights from the median, 
making a table of the form of Table XL. 

3. Find the average deviation of freshman heights from the modal height. 

4. Find the average deviation of freshman weights from the modal 
weight, using the data of Table XIII. 

6. Find the average deviation from the median of monthly precipitation 
at Seattle. 


6. Find the average deviation from the arithmetic mean of monthl}' 
precipitation at Seattle. 

7. Find the average deviation from the mode of monthly precipitation 
at Seattle. 

8. Find the average deviation from the mode of monthly mean tempera¬ 
tures at Seattle. 

9. Find the average deviation from the median of monthly mean tem¬ 
peratures at Seattle. 

10. Find the average deviation from the arithmetic mean of monthly 
mean temperatures at Seattle. 

11. Find the corresponding coefficient of dispersion for each of the meas¬ 
ures of dispersion that have been found in Exs. 1 to 10 inclusive. 

12. Get data of monthly mean temperatures and monthly precipitation 
from your nearest local weather bureau office and find measures of dispersion 
and coefficients of dispersion as done in Exs. 1 to 11 inclusive. Compare 
the results with results from the tables for Seattle. 

13. Find, by the direct method, the standard deviation of freshman 
weights from the data in Table XIII. Use the form of Table XLI. 

14. Find, by the short-cut method, the standard deviation of freshman 
weights. Check with the result of Ex. 13. Use the form of Table XLI I. 

16. Find the value of 5/<r for freshman heights and for freshman weights. 

Which one is nearest to the theoretical value of four-fifths? Was this to be 
expected? 


16. Make a table, showing for freshman weights the same set of division 
points as shown for heights in Table XLVI. 

17. Find the quartile measure and quartile coefficient of dispersion for 
freshman heights and for freshman weights. 

18. Find the standard deviation, average deviation from the arithmetic 
mean and quartde measure of dispersion of the leaf lengths shown in Table 

IQ v- Ubtam the corres P°nding coefficients of dispersion. 

9. rnd the standard deviation, average deviation from the arithmetic 

YXIV^' nit **' measure of dis Persion of the leaf breadths shown in Table 
V ’ ° btam the corresponding coefficients of dispersion. 



166 


A FIRST COURSE IN STATISTICAL METHOD 


20. Determine the average deviation from the median leaf length found 
in Ex. 27, Chap. IV. Find the corresponding coefficient of dispersion. 

21. Determine the average deviation from the median leaf breadth found 
in Ex. 28, Chap. IV. Find the corresponding coefficient of dispersion. 

22. Using the table constructed in Ex. 19, Chap. II, find standard devi¬ 
ation of leaf lengths. Use both the direct method and the short-cut method. 
Find the corresponding coefficient of dispersion. 

23. Using the table constructed in Ex. 20, Chap. II, find standard devia¬ 
tion of leaf breadths. Use both the direct method and the short-cut method. 
Find the corresponding coefficient of dispersion. 

24. Using the table constructed in Ex. 19, Chap. II, find the quartile 
measure and the quartile coefficient of dispersion of leaf lengths. 

26. Using the table constructed in Ex. 20, Chap. II, find the quartile 
measure and the quartile coefficient of dispersion of leaf breadths. 

26. Make a single table showing all the measures and coefficients of 
dispersion found in Exs. 20 to 25. Construct the table so that comparisons 
can be readily made. Which measure of dispersion of leaf lengths is 
greatest? Which least? Which coefficient of dispersion of leaf lengths is 
greatest? Which least? Answer the same questions in regard to leaf 
breadths. Which has the greater scatter, leaf lengths or leaf breadths? 
Which has the greater relative amount of scatter? 


27. 

Table A. —Distribution of Labor Incomes of Farmers, 1910 to 1915' 


Classes of annual labor income 


$ — 1,500 and less 
-1,500-8-1,000 
-1,000- 500 


- 500- 

0 - 
500- 
1 , 000 - 

1.500- 

2 , 000 - 

2.500- 
3,000- 
4,000- 
5,000- 


0 

500 

1,000 

1.500 

2,000 

2.500 
3,000 
4,000 
5,000 

10,000 


Over 10,000 


Number in each 
income class 


64,000 

76,800 

300,800 

1,529,600 

2,336,000 

1,132,800 

473.600 

217.600 
96,000 
64,000 
64,000 
19,200 
25,600 

0 


Totals 


6,400,000 


Total labor 
income, in 
millions 


$- 96 
- 96 
-226 
-382 
+584 
850 
592 
381 
216 
176 
224 
86 
192 
0 

$2,501 


i From "Income iu the United States.” vol. II. 





DISPERSION 


167 


Table B.— Distribution of Labor Incomes of Farmers, 191G to 1918 1 


Classes of annual labor income 

Number in each 
income class 

Total labor income 
in millions 

$ — 1,500 and less 

84,500 

$-127 

— 1,500-$ 

-1,000 

05,000 

- 81 

-1,000- 

- 500 

200,500 

-200 

- 500- 

0 

903,500 

-220 

0- 

- 500 

1,859,000 

+405 

500- 

1,000 

1,293,500 

970 

1,000- 

1,500 

708,500 

880 

1,500- 

2,000 

390,500 

094 

2,000- 

2,500 

299,000 

073 

2,500- 

3,000 

1G9.000 

405 

3,000- 

4,000 

201,500 

705 

4,000- 

5,000 

84,500 

380 

5,000- 

10,000 

117,000 

877 

Over 10,000 

52,000 

520 

Totals.. 


G,500,000 

eft nm 


OO,UUl 


1 From "Income in the United States." vol. II. 

Make additional columns to Tables A and B, providing the data for 
plotting a Lorenz curve for each table. Assume that the average income 
in each class is the midpoint of the class except for the first and last 
classes. For the first class of -$1,500 and less assume that the average 
for the class is -$1,500. For the last class of over $10,000 assume that 
the average for the class is S10,000. 

Having completed the tables plot a Lorenz curve for each table, both 

on the same set of axes. Plot the 1910 to 1915 curve in red and the 1910 
to 1918 curve in black. 

In which period was the distribution of income of farmers more uniform? 
Does this show a greater degree of prosperity for farmers in that period? 

\\ hat is the significance of the point where the curve crosses the line of 0 
per cent of income? 

If the tables were lost, how could you tell from the curve what per cent of 
the farmers received negative income? 







168 


A FIRST COURSE IN STATISTICAL METHOD 


28 . 


Australian War Census of Incomes 1 


Males 


Females 


Income class 


Under £50 
£ 50 and 

100 and 
150 and 
156 and 
200 and 
300 and 
500 and 
750 and 
1,000 and 
1,500 and 
2,000 and 
3,000 and 
4,000 and 
5,000 and 

Total.... 


under £ 

under 

under 

under 

under 

under 

under 

under 


100 

150 

156 

200 

300 

500 

750 

1,000 


under 1,500 
under 2,000 
under 3,000 
under 4,000 
under 5,000 
over 


Number 

Amount of 
income to 
nearest 
thousand 

Number 

Amount of 
income to 
nearest 
thousand 

145,513 

£ 4,163 

301,592 

£ 6,717 

327,835 

24,308 

168,106 

11,416 

448,195 

55,090 

52,929 

6,250 

46,630 

7,093 

3,651 

558 

157,350 

27,219 

12,697 

2,211 

106,324 

25,191 

11,001 

2,641 

49,108 

18,388 

6,617 

2,498 

15,928 

9,603 

2,691 

1,633 

6,313 

5,393 

1,145 

970 

4,933 

5,994 

905 

1,089 

2,132 

3,676 

364 

629 

1,707 

4,149 

317 

772 

659 

2,249 

102 

361 

375 

1,685 

58 

258 

746 

7,300 

86 

656 



* From "Income in the United States," vol. II. 

Make a table showing cumulative values and cumulative per cents both 
for number and for amount of income for males and for females. From 
the cumulative per cent columns plot points for a Lorenz curve for ma cs. 
On the same set of axes plot points for a Lorenz curve for females. Smoo 
both graphs, using black for males and red ink for females. Place a proper 

legend in the lower right-hand corner. 

Is there a marked difference shown in distribution of wealth for ma 

and for females? m „i P s? 

Is there more or less unequal distribution for females than 

What shows this? hfdf 

What per cent of total wealth of each sex is possessed by t le p 

° f Whatper cent of the total wealth of each sex is possessed by the richest 5 

What per cent of the total wealth of each sex is possessed by the so-c 
“middle half,” t\e., by the portion between 50 and 9o per cent. 










CHAPTER VI 


SKEWNESS 

It was noted in the case of freshman heights that the frequency 
distribution was nearly symmetrical. This w'as evident from the 
smoothed frequency curve of Fig. 31. 

1. Skewness.—In the case of freshman weights there w^as a lack 
of symmetry. The smooth curve of Fig. 29 tails off to the right 
of the ordinate to the high point farther than it does to the left. 
When it happens that the curve tails off farther to one side of 
the mode than to the other, skewness is said to be present. The 
distribution and the curve are said to be skewed. 

When the values of the variable are a set of observations on a 
single thing and there is no reason for thinking that positive devia¬ 
tions from the arithmetic mean are any more or any less likely 
to occur than negative deviations of the same size, a symmetrical 
distribution may be expected. Such an example would be a 
large number of measurements of the same line by a surveyor. 
But if the observations consist of a set of measurements of a single 
characteristic of a large number of objects, a certain degree of 
skewness is likely to be found. The measurements of the weights 
of 629 freshmen constituted such a set. So also did the measure¬ 
ments of the heights of these freshmen. In one case skewness 
was noticeable; in the other case, not. 

It is an important fact that in many cases skewness is suffi¬ 
ciently slight to be neglected. 

2. Effect on Mode, Median, and Arithmetic Mean.—If the 
frequency curve tails off farther to the right of the mode than 
to the left, it will almost invariably follow that more than half 
the area under the curve falls to the right of the ordinate at the 
mode. Since the ordinate at the median bisects the area under 
the curve, the median must then be shifted to the right and be 
larger than the mode The ordinate at the arithmetic mean 
passes through the center of gravity of the area under the curve. 
This will also almost invariably be to the right of the maximum 
ordinate if the curve tails off farther to the right than to the left. 
Then the arithmetic mean will be larger than the mode. The 

169 



170 


A FIRST COURSE IN STATISTICAL METHOD 


median is shifted to the right only by virtue of an increased num¬ 
ber of large items over what there would be if the part of the curve 
to the right of the mode were symmetrical with the part to the left. 
The center of area is shifted only because of added area to the 
right. In getting the position of the center of gravity each 
element of new area is multiplied by its distance from the vertical 
axis and added to the previous products. The greater the dis¬ 
tance from the axis, the greater the added product, and hence the 
greater effect in shifting the center of gravity. It follows that 
almost invariably the ordinate through the center of gravity of 
the area will be shifted farther from the maximum ordinate 
than is the ordinate through the center of area. In other words, 
in a skewed distribution tailing off farther to the right than to 
the left, the difference between the arithmetic mean and the 
mode is almost invariably greater than the difference between 
the median and the mode. If the distribution tails off farther to 
the left of the ordinate at the mode than to the right, the same 
reasoning leads us to expect the same relationship of differences 
among mode, median, and arithmetic mean. It has been found 
empirically that in many cases 

(a — Z) = 3 (a - M). 

In the case of freshman weights 

a — Z = 4.5, 
a — M = 1.4, 

and the above relation holds nearly. In many cases, however, 
the relation does not hold. 

3. Measure of Skewness.—A natural measure of skewness 
would be a number determined by the effect of the skewness. 
When there is symmetry, there is no skewness. Then arithmetic 
mean, median, and mode are all equal. Skewness separates these 
three statistical constants. The greater the skewness the 
greater the separation, usually. The amount the arithmetic 
mean is separated from the mode, that is, the deviation of t e 
arithmetic mean from the mode, 

a - Z, 

gives a measure of skewness. The deviation of the me lan 
from the mode, 

M - Z } 


may also be used. So may 

a — M, 

the deviation of the arithmetic mean from the median. 



SKEWNESS 


171 


4. Coefficient of Skewness.—Suppose that a — Z for one 
variable equals a — Z for another variable, but that the varia¬ 
bility, dispersion, scatteration, in one case is greater than in the 
other. It is evident that the deviation of arithmetic mean 
from the mode is relatively less in the case of large skewness 
than in the case of small skewness. So for a coefficient of dis¬ 
persion a - Z should be divided by the measure of dispersion, 
either standard deviation <r, or average deviation 5 a . Coefficient 
of dispersion is 

a - Z 


a-Z 

“ 5 a 

The deviation of arithmetic mean from the mode seems to be 
the most logical of the above measures, but it frequently happens 
that the mode is not well defined. In that case a — M makes a 
better measure for skewness. The corresponding coefficient is 
either 

a — M 

a 

or 

a — M 
5a 

5. Algebraic Sign.—If a > Z, skewness is positive, and the 
frequency curve usually tails off farther to the right than to the 
left. If a < Z } skewness is negative and the frequency curve 
usually tails off farther to the left. 

6. Moments.—Another measure of skewness sometimes used 
is the cube root of what is known as third moment about the arith¬ 
metic mean. Moments about the arithmetic mean are defined 
as follows: 

First moment 
Second moment 
Third moment 

and so on for higher moments. 




172 


A FIRST COURSE IN STATISTICAL METHOD 


It might seem in the case of positive skewness with large posi¬ 
tive values of d a and small negative values that first moment 
might measure skewness. It does not, because 



since 2 d a always equals 0. 

In second moment about the arithmetic mean, squares of the 
deviations are used and an excess of positive or of negative 
deviations would not be revealed. So 

2 dj 
N 


is not a measure of skewness. Square root of second moment 
about the arithmetic mean is standard deviation. 

7. T hir d Moment a Measure of Skewness.—With third 
moment about the arithmetic mean, however, an excess of posi¬ 
tive deviations will make 2 d a 3 positive and an excess of negative 
deviations will make it negative. Deviations being cubed, 
large deviations will have great weight as compared with small 
ones. So that the greater the skewness, the greater will be the 
third moment, positive if skewness is positive, and negative if 
skewness is negative. As square root of second moment about 
the arithmetic mean was used to give standard deviation, so 
here cube root of third moment about the arithmetic mean is used 


for the measure of skewness. 

Skewness = 

The corresponding coefficient of skewness would naturally be 


3 /2d? 
V N 


{i 

v 


Ida 3 

N 

2 dj 
N 


cube root of third moment 


This measure and coefficient are laborious to compute. How¬ 
ever in the problem of fitting a mathematical curve to a give 
frequency distribution, as high as the fourth moment is useu. 
Professor Karl Pearson has devised a measure of skewness l 


ing moments as high as the fourth. „ re 

8. Quartile Measure of Skewness.— As was done o 
of dispersion, so with skewness, only the middle half of the arr y 



SKEWNESS 


173 


may be used if one wishes to eliminate the influence of the 
extreme items. If there is symmetry of distribution 

M - Qi = Qz - AI. 

Suppose the symmetry is destroyed by adding n large items to 
the array. Addition of items at the right-hand end of the array 
is sure to shift the positions of both quartiles and the median to 
the right. Qi is shifted n/4 items; M, 2/i/4 items; and Q 3 , 3n/4: 
items to the right. Consideration of the symmetrical frequency 
curve shows that Qi is shifted to a region of greater frequency. 
A movement over ?i/4 items will then change the size of Qi 
but little. M is shifted to a region of less frequency though the 
frequency is not diminishing rapidly. A movement over 2n/4 
items will change the size of M more than the size of Q\ was 
changed. Q 3 is shifted to a region of less frequency in a part of 
the distribution where frequencies are diminishing more rapidly. 
A movement over 3n/4 items will change the size of Q 3 more than 
the size of M was changed. In consequence of these changes 

AI — Qi < Q 3 — il/. 

The difference between Q 3 and M has become greater than the 
difference between Q\ and M in consequence of the skewness 
that has been introduced. The difference of these differences 
then gives a measure of skewness in which the influence of the 
size of extreme items has been eliminated. The quartile measure 
of skewness then equals 

(Qz AI) — (AI — Qi) = Q 3 -}- Qi — 2 AI. 

If skewness has become negative instead of positive, the same 
reasoning would have applied. AI — Qi would be greater than 
Q 3 — AI, and the resulting measure of skewness would be nega- 
tive. A quartile coefficient of skewness may be obtained by 

dividing by the entire interquartile range. Quartile coefficient 
of skewness equals 

Qz + Qx - 2AI 

Qz — Qi 

All these measures and coefficients of skewness become zero, 
as they should, when the distribution is symmetrical. 

The probability distributions derived from the expansion of 

(p + q) n 

are symmetrical for all values of n if, and only if, p = q, if 

P * q the distribution becomes asymmetrical, approaching 
symmetry as n increases. 



174 


A FIRST COURSE IN STATISTICAL METHOD 


9. Skewness from Pure Chance.—The throwing of dice gives a 
good example. If one die is thrown, the probability that an ace 
will come up is }£, and the probability that an ace will not come up 
is Y§. The same theorems on probability that were applied to 
the result of tossing coins apply here to show that the terms of 
the expansion of 

(% + v& 

give, respectively, the probabilities of throwing no ace, one ace, 
two aces, and so on, in throwing n dice once. If the n dice are 
thrown N times, then N times each of these terms gives the corre¬ 
sponding expected frequencies of no ace, one ace, two aces, and so 
on. 

10. Example.—Suppose seven dice are thrown 1,000,000 times. 

+ g) 7 = ^ (5? + 7 • 56 • 1 + 21 • 5S ' 12 + 35 ' 54 ' P 

+ 35 • 5 3 • l 4 + 21 • 5 2 • l 5 + 7 • 5 • l 6 + l 7 ) 

= 279W 78 - 125 + 109,375 + 05,625 + 21,875 

+ 4,375 + 525 + 35 + 1). 

Multiplying each term by 1,000,000 gives the expected fre- 
quencies as shown in Table L. 


Table L.— Probable Frequencies of Aces in 1,000,000 Throws of 

Seven Dice 


Number of accs 


Frequency 


0 

1 

2 

3 

4 

5 

6 
7 


Total 


279,000 
391,000 
234,000 
78,300 
15,700 
1,870 
126 
4 

1,000,000 


Inspection of Table L, without plotting a curve, shows a fair 
degree of symmetry if only the first three entries 
Beyond that point, the distribution tails off rapidly to very 

of the largest number of aces possible. 




SKEWNESS 


17 5 


11. Measures of Skewness of Freshman Weights—Referring 
to the measurements of freshman weights, 

a = 142.25. 

Z = 137.8. 

M = 140.5. 

Qi = 129.4. 

Q 3 = 153.2. 
cr = 17.9. 

Measure of skewness, 

a - 2 = 4.4. 

Coefficient of skewness, 

—— = 0.24. 

(T 


Quartile measure of skewness, 

Q 3 + Qx ~ 2 M 

Quartile coefficient of skewness, 

( h + Qi _-_24/ 

(?3 — Ql 


1 . 6 . 

0.07. 


12. Significant Degree of Skewness.—The use of certain statis¬ 
tical processes depends on the distribution being normal, or 
approximately so. Skewness measures the degree of lack of 
symmetry. The question as to how great skewness may be 
before these processes cease to have significance is not fully 
settled. It is safe to say, however, that if the measure of 
skewness is more than four times its probable error, the distribu¬ 
tion is sufficiently far from normal distribution to destroy the 
reliability of these processes. The meaning of “probable error” 
will be discussed presently. 

There are many distributions, particularly among biological 
measurements, which follow the normal distribution approxi¬ 
mately. In economics the normal distribution is much less 
likely to occur. It also frequently happens that a distribution is 
very much skewed, but the logarithms of the values of the vari¬ 
able have nearly a normal distribution. This gives a point of 
attack for studying the distribution. 

13. Extreme Forms.—There are extreme types of asymmetry. 
The greatest frequencies may occur with the very smallest sizes 
and decrease as the size of item increases. Or, the small sizes of 
the variable may have very low frequencies increasing with 
increase of the size of the variable. For these types the frequency 



176 


A FIRST COURSE IN STATISTICAL METHOD 


curve does not go up to a high point and then descend, but is high 
at one end and low at the other. Such distributions are said to 
be “ J-shaped.” There are cases in which high frequencies occur 
both for small sizes and for large sizes of the variable, with low 
frequencies for the medium sizes. The curve is high at both 
ends and low in the middle. Such distributions are called 
“ U-shaped ” Examples of J- and U-shaped distributions are 
given by Yule. 1 

Exercises 

1. Find a — Z for freshman heights. Using this as a measure of skewness, 
which has the greater skewness, the distribution of freshman heights, or of 
freshman weights? In making the comparison, should the measure of 
skewness be used or the coefficient of skewness? Using a — Z as a measure 
of skewness, find the coefficient of skewness of freshman heights. Compare 
this with the coefficient of skewness of freshman weights. 

2. Find the quartile measure of skewness of freshman heights. How does 
this compare with the corresponding measure for freshman weights? Would 
it be expected that these measures would be larger or smaller than the 
corresponding a — Z measures? Why? 

3. How do the quartile coefficients of skewness of freshman heights and 
freshman weights compare? 

4. Find a — Z for the leaf lengths of Ex. 19, Chap. II, and a — Z for 
the leaf breadths of Ex. 20, Chap. II. Using a — Z as a measure of skewness, 
which has the greater skewness, the distribution of leaf lengths or that of 
leaf breadths? 

6. Find the quartile measure of skewness for leaf lengths of Ex. 19, Chap. 
II, and for leaf breadths of Ex. 20, Chap. II. 

‘G. Udny Yule, “An Introduction to the Theory of Statistics,” pp. 98- 
105. 



CHAPTER VII 


PROBABLE ERROR 

The meaning of the term “probable error” will now be 
ascertained. 

1. Definition of Probable Error.—The interquartile range 
(Qz — Q 1 ) includes exactly half the items in the array. Suppose 
each value of the variable is written on a card and placed in a 
sack. Shake up the sack and draw a card at random. The 
chances are even that the value on the card drawn will be inside 
the interquartile range, since exactly half the values are within 
this range and half outside. The probability of drawing a value 
within the interquartile range is If, now, the distribution 
follows the normal probability curve, 

Qz - M = M - Q v 

equals the semi-interquartile range. Moreover, 

M = a, 

and the semi-interquartile range is the quartile deviation from the 

arithmetic mean. This quartile deviation from the arithmetic 

mean is known as the probable error of the series of values of the 

variable. Probable error of a set of observations may then be 

defined as that deviation from the arithmetic mean on either side 

within which exactly half the observations lie. If one of the 

observations be selected at random, it is as likely as not that it 

will differ from the arithmetic mean by not more than the probable 
error. 

It can be shown that 

p.e. = 0.67<r 

nearly. 

2. x/a and Normal Probability Curve.—In the discussion of 
symmetric distributions in Chap. Ill, pages 77 to 83, x/a was used 
as the unit of measure on the axis of the independent variable. 
Let it be recalled that x is deviation from the arithmetic mean and 
<r is standard deviation. The equation of the normal probability 
curve has been derived and found to be 

X* 

2a* 


y = y oe 

177 





178 


A FIRST COURSE IX STATISTICAL METHOD 


as was referred to on page 82, Chap. III. The areas under the 
curve from the //-axis up to any ordinate have been tabulated for 
varying values of x/<r. 1 The y -axis goes through the highest 
point of the curve and so is at the mode. From the symmetry of 
the curve, mode equals arithmetic mean equals median. So the 
//-axis is at the arithmetic mean and at the median. Then x, 
being measured from the arithmetic mean is deviation from 
arithmetic mean. The area under the curve from the left-hand 
end up to any ordinate is the total frequency up to the size of item 
at that ordinate. The area up to the y -axis is half of the total 
area. The area from the y -axis up to the third quartile must be 
one-quarter of the entire area, since one-quarter of the total 
frequency is included between the median and the third quartile. 

3. Areas under Normal Probability Curve.—Table LI is made 
from a table of areas under the normal probability curve. The 
entire area is taken as 1. If N = 100,000, then 100,000 times 
the entries in the second column gives the total frequency up to 
the ordinate at x/a. Similarly for the other columns. 


Table LI. —Areas under the Normal Probability Curve 





Fraction of area 



Fraction of area 

Fraction of area 

between the or- 


X 

up to the ordi- 

from //-axis up 

X 

dinates at and 

<7 

1 — a 

<7 

x 

natc at 

to ordinate at - 

X 



a 

<j 

(7 





a 


0 

0.50000 

0 

0 

i 

0.67449 

0.75000 

0.25000 

0.50000 

0.50000 

1 

0.84134 

0.34134 

0.68269 

0.31731 

2 

0.97725 

0.47725 

0.95450 

0.04550 

3 

0.99S65 

0.49S65 

0.99730 

0.00270 

4 

0.99997 

0.49997 

0.99994 

0.00006 


z/a = unit of measure on the x-axis. 


One-fourth of the entire area, 0.25000, from the //-axis is at 

- = 0.G7449 

(T 

» James W. Glover, “Tables of Applied Mathematics." Karl Pearson, 
“Tables for Statisticians and Biometricians.” Table II gives the entire 
area from the left end up to the ordinate in question. The tabular values 
may be obtained by a process known as integration between limits. 



PROBABLE ERROR 


179 


and 

x = 0.G7449<r. 

As this value of x is at a quarter of t he entire frequency from t lie 
middle of the array, it is the value of the deviation of the third 
quartile from the arithmetic mean which here is also its deviation 
from the median. 

By definition, the deviation of third quartile from the arith¬ 
metic mean is probable error. Hence the relation, 

p.e. — O.G7o, 

nearly. This is semi-interquartile range. 

4. Skewed Distribution.—If the distribution is skewed, the 
deviation of the third quartile from the arithmetic mean is no 
longer the semi-interquartile range. In such case probable error 
is simply defined as being 

p.e. = O.G7449<r. 

Thus p.e. simply becomes a measure of variation equal to about 
two-thirds of standard deviation. 

Always exactly half the observations lie between the first and 
third quartiles. Only in case of normal probability distribution 
do exactly half the observations lie between x = —p.e. and x = 
+p.e. Values of a large number of measurements of a single 
object are likely to follow normal probability distribution quite 
closely. No significant error would then be expected in assuming 
that half the observations lie between arithmetic mean minus 
probable error and the arithmetic mean plus probable error. 

5. Chances of a Random Selection Falling beyond Certain 
Limits.—In the last column of Table LI is tabulated the fraction 
of the total frequency that will fall outside the limits of — x/a to 

In the next to the last column is tabulated the fraction of 
the total frequency that falls between the limits — x/a to -{-x/a. 
At x/a = 0.67449 half the items fall outside these limits and half 
within. So in choosing an item at random the chances are 1:1 of 
its falling between these limits. At x/a = 1 there are 0.31731 of 
the items outside of x = -a to x = -fa, and 0.G82G9 of the items 
between the limits x = -a to x = +<7. Since 0.31731 items are 
about one-third of the total, in choosing an item at random, the 
chances of getting one differing from the arithmetic mean by not 
more than a are about 2:1. In the same way, at x/a = 2, 0.04550 
of the items fall outside the interval from x = —2a to x = +2 a. 
This is about one-twenty-second of the total number of items. So 



180 


A FIRST COURSE IN STATISTICAL METHOD 


in choosing an item at random from the entire set, the chances are 
about 21: 1 of getting one differing from the arithmetic mean by 
not more than 2a. Following on down the column, it is seen that 
the chances of getting an item at random with deviation from 
arithmetic mean of not more than 4a are about 16,000:1. In other 
words only about 1/16,000 of the entire area under the curve is 
not included between the ordinates at x = —4a and x = +4 a. 
A deviation of more than 4a can scarcely happen by influence of 
chance alone. 

The size of standard deviation may thus be used as a means of 
measuring the degree of reliability of a set of observed values of a 
variable. Custom has introduced the convention of using prob¬ 
able error instead of standard deviation. Since 

p.e. = 0.67449(r, 

it is as definite a measure as is standard deviation itself. Prob¬ 
able error has the simplicity of being that deviation from the 
arithmetic mean such that, if an item is selected at random from 
the entire set, the chances are even that its deviation from the arith¬ 
metic mean will not be more than the probable error. It must be 
remembered that this statement is strictly true only in case of a 
normal probability distribution. 

Now tabulate areas under the normal probability curve in 
terms of probable error instead of standard deviation, as shown in 
Table LI I. 


Table LII.— Areas under the Normal Probability Curve 

X 

= unit of measure on the z-axis 

p.e. 


x 

p.e. 


Fraction of area Fraction of area 
Fraction of between the ordi- outside the ordi- 
area from x . x x 

j-axis to nates at - ^ natcs at _ ~ 

— and + — 

p.e. p.e. 

A 


nates at-- 

p.e. 

and -f — - 
p.e. 

1 - A 


Odds against 
the occurrence 
of a deviation 
greater than x 

A + (1 - A) 




PROBABLE ERROR 


181 


6. The Limits in Terms of Probable Error.—Examine the 

meaning of these figures at, say, x = 3 ( p.e ). Let N = 100,000. 
The total frequency from the arithmetic mean up to a deviation 
from the arithmetic mean of three times the probable error is 
47,848. The total frequency included between deviations of 
three times the probable error on each side of the arithmetic mean 
is 95,696. The total frequency outside these limits is 4,304. If, 
then, an item is selected at random from the entire set, the prob¬ 
ability of getting one differing from the arithmetic mean by more 
than three times the probable error is 4,304 100,000 = 

0.04304, or about 4.30 in 100. The odds against getting an item 
having a deviation from the arithmetic mean of more than three 
times the probable error are 95,G96 A- 4,304, or about 22:1. 

7. Probable Error of Various Constants.—Probable errors of 
various statistical constants have been computed by mathe¬ 
matical processes not necessary to reproduce here. Table LI 11 
gives a number of such values. 


Table LIII.—Probable Errors of Various Statistical Constants 


No. 

Constant 

Probable error 

1 

Arithmetic mean a of N observations. 

0.6745 — 

Vn 

0.8454 — 

Vn 

0.6745 ~r 

V2N 

1 — r 2 

2 

Median M of N observations. 

3 

4 

Standard deviation a of a normal distribution 

Coefficient of correlation r, normal distri¬ 
bution. 

5 

Correlation ratio 17 , approximately. 

u. 0 / *±o —— 

Vn 

0 674 5 * ~ 1,1 

6 

Any qimntity in terms of standard deviation 
of that quantity ., 

V/ . V/ 1 A KJ -- 

Vn 

0.6745(r 

n a 

7 

Semi-interquartile range 

8 

Difference between two averages a t — a,, 
whose probable errors E x and E t are known. 

U.OOUO -rzz: 

Vn 

VES + 


In explanation of these formulas take, say, No. 6. If N 

delTln ar , e “ ade ° D the VaIue of a the standard 

SIT* TJ b . 6 C T puted ' The Probable error is 0.6745 times 

trat'oT PU K the ^ deviation - Take No - 1 « another illus¬ 
tration. If the arithmetic mean of each of a number of sets of N 








182 A FIRST COURSE IN STATISTICAL METHOD 

observations each are determined, this set of means has a stand¬ 
ard deviation and probable error. This is the probable error of 
the arithmetic mean of a set of N observations and is found to be 
0.6745 a/y/N. 

8. Examples.—The probable error of a quantity is written 
immediately after the value of the quantity with a + sign 
between them. The probable error of the arithmetic mean of 
freshman heights is 

0.6745 -^L = 0.067. 

V629 

The arithmetic mean of freshman heights w’ould then be 
recorded as. 

a = 68.18 ± 0.067. 

This means that if a large number of groups of 629 freshmen 
each were measured and the arithmetic mean height of each 
group were computed, one w r ould expect to find about half of these 
means to fall between 68.18 — 0.07 = 68.11 and 68.18 + 0.07 = 
68.25. The chances are even that the correct arithmetic mean 
differs from 68.18 by not more than 0.07. 

The chances are very small that the true arithmetic mean differs 
from 68.18 by more than 0.27, or four times the probable error. 
In other words, if many similar samples of freshman heights 
were taken, the chances are very small of any arithmetic mean 
height being less than 68.18 — 0.27 or greater than 68.18 + 0.27, 
that is, less than 67.91 or more than 68.45 in. 

A good illustration of the practical application of probable 
error is given by Professor Pearl. 1 Suppose the average pulse 
rate of 150 people is found to be 79.68 ± 0.15 beats per minute. 
After administering a certain drug to each person, the average 
pulse rate is found to be 81.12 ± 0.20 beats per minute. May 
this increase of pulse rate be regarded as quite surely due to 
the drug or may it be a result of chance due to the sample. 

The difference of the two averages is 

o, - a 2 = 81.12 - 79.68 = 1.44. 

The probable errors are E\ = 0.15 and E 2 = 0.20. From No. 

8 in Table LIII __ 

p.e. (a, - a,) = V(0l5)* + (0.20) 2 

= 0.25. 

i Raymond Pearl, “Medical Biometry and Statistics,” p. 214. 



PROBABLE ERROR 


183 


The difference of the averages is then written 

di — 0.2 — 1.44 + 0.25. 

While the difference of 1.44 beats per minute is not very great, 
yet it is nearly six times its probable error. From Table LI I the 
odds against a deviation six times the probable error occurring 
from chance alone are about 20,000: 1. In other words, the 
chance of a difference of 

1.44 ± 0.25 

heart beats per minute not being significant is only about 1 in 
20,000. This makes it very certain that the drug administered 
had significant effect in increasing the average number of heart 
beats per minute. 

The meaning of the statement on page 175 in regard to skewness 
being more than four times its probable error should now be 
understood. Unless a deviation is more than four times its 
probable error, there is no great reason for thinking it may not 
result from pure chance. 

9. Value of Probable Error Depends on N .—From each of 
the formulas given in Table LIII, it is seen that the greater N is 
the smaller the probable error. So the reliability of a result, 
other things being equal, increases with increase of the number of 
observations made, when the distribution is approximately nor¬ 
mal. The dividing line between large and small numbers of 
observations is indefinite. The problem of distribution in small 
samples is an important one. 1 

10. Use of Probable Error. —Probable error serves as a measure 
of the extent to which a quantity may deviate from the average, 
due to chance alone. It thus becomes of great importance in 
connection with all observed measurements. 


Exercises 

1. Find the probable error of the arithmetic mean freshman height and the 
standard deviation of freshman height. Are these probable errors suffi¬ 
ciently large to make the arithmetic mean or standard deviation unreliable? 

2. If another group of 629 freshmen were selected, how much might the 
arithmetic mean freshman height of this group differ from the arithmetic 
mean of the sample of 629 used in this book without making it reasonably 
certain that this new group should not be used for studying average height 
o the freshman population of which our sample is representative? 

Sm^fQ KAR i L ^, E £ RSON ’ “ 0n the Distribution of Standard Deviations of 
bmall Samples, Bxomelrika, vol. 10, 1915. 


184 


A FIRST COURSE IN STATISTICAL METHOD 


3. Find the probable error of the arithmetic mean of mean monthly 
temperatures at Seattle, and the probable error of the standard deviation of 
monthly precipitation at Seattle. 

4. Find the probable error of the arithmetic mean of the number of heads 
per throw of the seven dimes of Table XV. 

6 . Find the probable error of the arithmetic mean leaf length determined 
from the table of Ex. 19, Chap. II. 

6 . Find the probable error of the arithmetic mean leaf breadi’n 
determined from the table of Ex. 20, Chap. II. 



CHAPTER VIII 


CURVE FITTING 

In studying frequency graphs an attempt was made to draw a 
smooth curve which was judged to be fairly representative of the 
frequency distribution of the entire universe from which the 
sample had been selected. In other words, the attempt was made 
graphically to pass from the properties of the random sample to 
the corresponding properties of the universe from which it was 
selected. It was discovered in the case of heights of freshmen 
that the frequency polygon fell fairly close to a symmetric curve 
determined by the terms of the expansion of 

NQi + M". 

Thus was found a definite mathematical function which 
followed quite closely the variations in freshman heights. More¬ 
over, the polygon determined by the terms of the expansion of 

n(v 2 + y 2 y 

approaches a definite curve as a limit as n increases without limit. 
This curve, the normal probability curve, gives an equational 
relationship between the independent variable x and the function 
y. This equation, 

V = Voe ** 

enables us to determine y for any given value of x. It will be 
recalled that x stands for deviation from the arithmetic mean, 
and y is the corresponding frequency in a normal probability 
frequency distribution. From the above discussion it would 
seem that this equation could be used to give a fair approximation 
to the frequency to be expected for any given freshman height 
among a universe of freshman heights for which the given sample 

is representative. 

In the case of freshman weights, the frequency curve as drawn 
did not prove so nearly symmetrical. There should be some 
means of deciding whether a normal probability curve fits 


186 


A FIRST COURSE IN STATISTICAL METHOD 


sufficiently close for practical purposes. Perhaps a curve deter¬ 
mined by the terms of the expansion of 

+ q) n , 

where p does not equal q, could be found which would fit more 
closely. If it is possible to get a mathematical relationship 
between two variables, then a better analysis may be made than 
if there is no apparent relationship. 

1. Curve of Closest Fit.—This leads at once to the idea of 
determining the curve of closest fit to a given set of data, of such 
a nature that it may be expected fairly to represent the entire 
universe from which the data are selected. The kind of curve to 
use in any given instance must be left largely to that good 
judgment which is the result of experience. When a set of points 
is plotted from a given set of observed data, it is quite likely to 
be noticeable that these points tend to arrange themselves in a 
definite manner. In other words, there appears to be some 
definite law governing the distribution of the points. Certain 
considerations help in selecting the type of curve whose equation 
is a statement of this law. It is then necessary to determine that 
particular curve of this type which is the curve of closest fit to 
the points plotted. Probably the curve will not pass through all 
the points plotted, due to the accidental errors of observation 
or the accidental deviations in selecting a random sample. 'I he 
idea of the curve is to bring out the general law, independent of 
accidental variations. It is always possible to write the equation 
of a curve passing through all the points. Its equation will, in 
general, have as many constants as there are points through 
which the curve is to pass. This not only makes the equation 
unwieldy, if there are many points, but destroys the very purpose 
of bringing out the general tendencies as shown by the required 

law. . . 

2. Certain Type Forms.—Experience has shown that in mos 

cases the laws of nature can be expressed by means of a few type 
forms of functions with relatively few constants involved, ine 
straight line and the parabola are two of the forms frequen y 
occurring. The equation of the straight line is 

y = ax + b. 

That of the parabola is 

y = ax 2 + bx c. 



CURVE FITTING 


187 


So-called parabolas of the nth degree with equations of the form 

y = cix n , 

and the exponential function with equation of the form 

y = ae* z , 

are also not infrequent. These four types will now be considered. 
There may also be mentioned other second-degree equations 
which include all conic sections, different forms of trigonometric 
equations, logarithmic equations, and equations which are com¬ 
binations of the functions given. 

3. Straight Line.—The straight line is 
one of the simplest forms to fit to a given 
set of data. There are several ways of 
getting the straight line of closest fit. 

If only two points are given, a straight 
line can always be passed through them 
by substituting the coordinates of each 
point for x and y in the equation 

y — ax + b 

and solving the two resulting equations 
for a and b. The values of a and b thus 
found, substituted in the original equa¬ 
tion, give the equation sought. For 
example, suppose that for s = l,y = 3 , 
and forx = 5, y = 11. 

Then the two equations are 

3 = a + b 
11 = 5a + b. 

Solving for a and b gives 

a = 2, b = 1. 

The required equation is, then, 

y = 2x + 1. 

The graph is shown in Fig. 4G. 

4. Graphical Method.-Take the following values of x and y: 

*_ 5 1 0 16 20 25 30 35 40 

y 10 15 20 27 28 ^35~41“ 4T^50‘ 

i Sh °"’ n in F 'g- 47 - A thread may now be 

in the no a ° ng t f he . pIotted P° ints and skated until it seems to be 

noston k t a T St fit t0 the POintS - WheD P^ced, its 

p sition is marked by a straight line. This will be the line 


12 

II 

10 

9 

8 

7 

0 

5 

4 

3 

2 


■ 




171 

■ 

■ 



R 

m 

■ 




fl 



■ 


K 



■ 



1 

s 


■ 



fl 



■ 

■ 





■ 

K 

■ 




m 


: 



m 


■ 

■ 


n 




■ 


■ 






1 Z 3 


r X 

) b 


1 

Fig. 4G.—Straight line 
through two points. 



188 


A FIRST COURSE IN STATISTICAL METHOD 


required. This may be called the graphical method. It requires 
good judgment in placing the thread. 



Fro. 47.—Straight line fitted by averaging. 


6. Method of Averaging.—A method in common use is known 
as the method of averaging. Substitute the coordinates of each 
point for x and y in the equation. 

y = ax + b. 

If there are n points, this will give n equations in a and b. 
Unless the plotted points fall exactly on a straight line, these 
equations will not all be consistent. Divide the n equations into 
two groups with as nearly the same number of equations in each 
group as possible. Set up two equations in a and b such that the 
coefficient of a in the first equation is the arithmetic mean of 
the coefficients of a in the first group, the coefficient of b the 
arithmetic mean of the coefficients of b , and the constant term 
the arithmetic mean of the constant terms. Make the coefficien 
of a in the second equation the arithmetic mean of the coefficients 
of a in the second group, the coefficient of b the arithmetic mean o 
the coefficients of 6, and the constant term the arithmetic mean 
of the constant terms. Solve these two equations for a and o. 
Substitute these values of a and b in the original equation. 

Since multiplying both sides of an equation by the same num 
does not destroy equality, this method may be shortene y 
adding the equations of the first group to obtain the first equatio 
and adding those of the second group for the second equ 
















































CURVE FITTING 


189 


Setting up the equation for the pairs of values of x and y given 
on page 187 gives 

10 = 6 
15 = 5a + b 
20 = 10 a + 6 

27 = 15 a + 6 

28 = 20a + 6 
35 = 25a + 6 
41 = 30a + b 
44 = 35a + b 
50 = 40 a+ b 

Adding the first five equations for the one and the last four for 
the other equation gives 

100 = 50a + 56 
170 = 130a + 46. 

Solving for a and 6 gives 

a = 1, 6 = 10. 

The equation sought is, then, 

y = x + 10. 

This is a straight line with slope equal to 1 and y-intercept 
equal to 10. It is the full line drawn in Fig. 47. 

A different grouping of the equations may give a somewhat 
different result. This is a defect of the method. For example, 
take the same set of equations and let the first group consist of 
the first four equations and the second group the last five. 
The resulting equations to solve for a and 6 are 

72 = 30a + 46 
198 = 150a + 56. 

The resulting values of a and 6 are 

a = 0.96, 6 = 10.8. 

This gives 

y = 0.96x + 10.8 

for the required line. This has not quite as much slope as the 

first fine found, while its y-intercept is greater. It is the dotted 
line shown in Fig. 47. 

If three constants are required, as for the parabola 

y = ax 2 -f bx + c, 

the eq uatio n s may be divided into three groups and the same 
method followed. Of course, there is the same defect in the 
method as shown above for the straight line. 



190 


A FIRST COURSE IN STATISTICAL METHOD 


6. Least Squares Line.— A third method is by means of least 
squares. Let the equation of the required line be 

y = ax + b, 

in which the slope a and the ?/-intercept b are determined from the 
observed pairs of values of x and y. Take one of the given pairs 
of values, say, x { and y { . These numbers are coordinates of the 
point (re,-, ?/,■). If this point does not happen to be exactly on 
the required line, then ax { + 6 does not exactly equal y,-, but will 
equal the ordinate y of the point on the line whose abscissa is X{. 
There is a difference, then, between the observed ordinate 
and the adjusted ordinate y. 

Vi - y = !/* - (ax { + b) 

There is such a difference for each pair of observed values of x 
and y. The difference will be zero for those points, and only 
those points, which fall on the adjusted line. 

7. The Principle Involved.—The principle of least squares 
states that the line of best fit is such a line as to make the sum of 
the squares of these differences a minimum. In other words, the 
sum of the squares of the deviations of observed values of y from the 
corresponding values of y for the line of best fit must be less than 
the sum of the squares of the deviations from the corresponding 
values of y for any other line. 

8. The Principle Applied.—For example, take the points B\. 
B 2 , B 3 , and B A in Fig. 48, plotted from a set of observations, 
AiBi = 2 / 1 , A 2 B 2 = 2 / 2 , etc. Let the line MN have the equation 

y = ax -f- b. 

The corresponding values of y are A 1 C 1 , A 2 C 2t etc. The devia¬ 
tions of the ordinates of the points, B h B 2) etc. from the corre¬ 
sponding ordinates on the line MN are 

AiBi - AyCi = 7 /i - y = 2/i - (asi + b )> 

a 2 b 2 - a 2 c c = y* - y = y* - ( ax 2 + &)• 

The principle of least squares states that if MN is the line of 

closest fit, then . 

\yi - ( ax , + b)Y + tot - (0 * 2 + &)F + [»» - (aX3 _ + ( ^] 

must be less than the corresponding sum of squares of deviations 
from any other line. The problem is to select the parameters 
a and b so as to make the sum of the squares of the deviations 
minimum. Represent the sum of the squares by the letter 



CURVE FITTING 


191 


The principles of differential calculus state that to make u a maxi¬ 
mum or a minimum the derivative of u with respect to a must be 
zero and the derivative of u with respect to b must be zero. 
So, form these two derivatives, equate each to zero and solve 
for the unknowns, a and 6. 1 



Fio. 48.—Least squares line. 

Since the differences are each squared, 

[y\ - (azi + b)] 2 = (axi + b — yi )* 

Merely for convenience in writing they will be written in the 

latter form. Assume there are N points for which the line is to 
be determined. 

Then 

u = (axi + b - t/,) 2 + (ax 2 + 6 - t/ 2 ) 2 + . . . 

AT + ( ax N + b — y N ) 2 

= X( ax *' + b ~ Vi)*- 

» = i 

The derivative of u with respect to b is 
8u 

5b = 2(aXl + b ~~ 2/0 + 2(ax 2 + b - y 2 ) + . . . 

n + 2(ax N + b — y N ) 

~ 2^(ax,- + b — y { ). 

1 

Ihe derivative of u with respect to a is 
8u 

to ~ + b - Vl ) Xl + 2(ax 2 + b - y 2 )x 2 + . . . 

n 2(oxat + b — y N )x N 

~ 2^x,-(ax,- -f- b — yi ), 

»“ i 

‘See Appendix D. 





192 


A FIRST COURSE IN STATISTICAL METHOD 


Equating these derivatives to zero gives 

N 

2%(axi + b - y { ) = 0, 

i=i 

N 

2 ^xtiaxi + b - y { ) = 0 . 

<=i 

It is sufficient to state here that the conditions that u should 
be a minimum and not a maximum are also satisfied. Dividing 
both equations by two, expanding, and dropping the indicated 
limits on 2, 

aZXi + Nb - 2= 0, 

aZXi 2 + blXi — 2 XiUi = 0 . 

These equations are known as normal equations. Systematic 
schemes for writing and solving general normal equations are 
developed in the Theory of Least Squares. 

Solving for a and b and dropping the subscripts: 

N'Lxy — 2x • 2y 
a ~ ~N2x* - (2a;) 2 ’ 

, 2a; 2 • ly — Xx • 2 xy 

b ~ ~ NXx 2 - (Xx) 2 

Four different summations, 2x, 2 y, lx 2 , and 2 xy, are required. 

9. Example. —Applying these results to the example just used, 
tabulate as in Table LIV. 


Table LIV. —Computations for Values of Parameters a and b in the 

Equation of Straight Line of Closest Fit 



CURVE FITTING 


193 


Substituting in the formulas for a and b gives, 

9 X 6,885 - 180 X 270, 
a 9 X 5,100 — (180) 2 

5,100 X 270 - 180 X 6,885 
b ~ 9 X 5,100 - (180) 2 

Whence 

a = 0.99 
b = 10.2. 

The equation of the required straight line of closest fit by 
the method of least squares is, then, 

y = 0.99* + 10.2. 

10. Comparison of Results.—The slope of this line 

a = 0.99 

lies between the two slopes that were obtained from the two 
groups of equations in the method of averaging. The ^-intercept, 

b = 10.2, 

also lies between the two ^-intercepts obtained by the method of 
averaging. Collecting the values of a and b that have been 
obtained, 

Averaging, first grouping, 

a = 1.00, b = 10.00, 

Averaging, second grouping, 

a = 0.96, b = 10.8. 

Least squares line, 

a = 0.99, b = 10.2. 

It is seen that the line of the first grouping is closer to the least 
squares line than is the line of the second grouping. There are no 
means at hand of knowing this beforehand. The labor of compu¬ 
tation for the least squares line is not very much greater than is 
that for the method of averaging. Especially is this true if a 
computing machine or mathematical tables are available. 

11. Simplification.—It will now be shown how the formulas for 
a and 6, and hence the computations, may be simplified. 

Take the origin of measurement of * at the arithmetic mean of 



194 


A FIRST COURSE IN STATISTICAL METHOD 

This has the effect of moving the y- axis to the value a x on the 
^axis (see^Fig. 49). Call the new axes x'-axis and y'-axis with 
origin at 0 . The values of the independent variable now become 
deviations from the arithmetic mean. Call them x\ The values 
of the dependent variable y remain unchanged. The formulas 
for a and b now are: 

_ N 2 x'y - 2x' • 2y 
° N Si' 2 - (Si') 2 ’ 
b _ 2i' 2 • Zy - Si' ■ Zi'y 
N Si' 2 - (Si') 2 



Fio. 49.—Least squares line, changed axes. 


But since x' is deviation from the arithmetic mean of x 

lx' = 0. 

This reduces the formulas to: 

N 2 x’y _ 2 x'y 

N 2x' 2 “ 2x /2 ' 


a — 


_ 2x^ • 2 y 
6 “ N 2x' 2 


2 y 

N 


= a,/ 


The direction of the least squares line has, of course, not been 
changed. The y'-intercept becomes the arithmetic mean of y • 





CURVE FITTING 


195 


12. Example.—Applying these results to the same data as 
before, Table LV results. 


Table LV 


X 

V 

x' 

x' 2 

x'y 

0 

10 

-20 

400 

~ 200 

5 

15 

-15 

225 

- 225 

10 

20 

-10 

100 

- 200 

15 

27 

- 5 

25 

- 135 

20 

28 

0 

0 

0 

25 

35 

5 

25 

175 

30 

41 

10 

100 

410 

35 

44 

15 

225 

G60 

40 

50 

20 

400 , 

1,000 

180 

270 

• • • • 

1,500 

1,485 



a 


1,485 

1,500 


^ = 27 % = 30. 
= 0.99. 


b 



In Fig. 49 the points (x, y ) are plotted from the x-axis and the 

V- axis. From the values a = 0.99 and b = 30, the line MN is 

plotted from the x'-axis and the y'-axis. Its equation referred to 
these axes is 


y' = 0.99x' + 30. 

The value, b = OB, of the y-intercept, on the y- axis, may be 
measured graphically. It may be computed as follows: 

AC 

BA ~ a ' 


AC = O'C - O'A = O'C - OB = a - 6. 
BA = a x . 

Therefore: 

a v — b 


Solving for 6 gives: 

b = a v — a • a*. 

In the example, then, 

b = 30 - 0.99 X 20, 
b = 30 - 19.8, 
b = 10.2. 


196 


A FIRST COURSE IN STATISTICAL METHOD 


So the equation, referred to x-axis and y- axis, is 

y = 0.99x + 10.2. 

13. Graduation or Adjustment of Observed Values—From the 
equation of the line, the value of y may be obtained for any given 
value of x on the assumption that this least squares line is the 
best representation of the law connecting x and y. Also the 
adjustments to be made in the original observed values of y to 
fit this law may be computed. Computations and tabulation 
for the adjustment are shown in Table LVI. 


1 able LVI. —Adjustment of Observations to Least Squares Line 


Observed x 

Observed y 

y = 0.99x 4 10.2 

Correction to 
observed y 

0 

10 

10.20 

40.20 

5 

15 

15.15 

40.15 

10 

20 

20.10 

40.10 

15 

27 

25.05 

-1.95 

20 

28 

30.00 


25 

35 

34.95 


30 

41 

39.90 

-1.10 

35 

44 

44.85 

40.85 

40 

50 

49.80 

-0.20 




3.30-3.30 


The observed y 's all have the same weight and the least squares 
line passes through the average y, so that the totals of positive 
and negative corrections balance each other. 

Applying the corrections to the observed values of y, the 
observations are said to be graduated to the least squares line. 

14. Parabola of Closest Fit.—After plotting points for a set of 
observed pairs of values of two variables, x and y, judgment and 
experience may lead to the belief that a parabola 

y = ax 2 4- bx + c 

could be found that would fit better than a straight line. Again, 
by the principle of least squares 

v 

u = ^[aXi 2 4- bii -he — ?/,•]* 

«~ i 


CURVE FITTING 


197 


must be made a minimum by proper choice of the parameters 
a, b, and c. So, 



must be satisfied. And the result after dividing each equation 
by 2 and dropping subscripts is 


Expanding: 


Z(ax 2 -f- bx + c 
Zx(ax 2 + bx + c 
Zx 2 (ax 2 + bx + c 

aZx 2 + bZx + Nc 
aZx 3 -f bZx 2 + cZx 
aZx 4 + bZx 3 + cZx 2 


y) = 0, 
y) = o, 
y) = o. 

2 ?/, 

Zxy, 

2 xhj. 


'(PS,so) 


!(po. PS) 


these are the normal equations whose solution gives the 
required values of a, b, and c to use ^ 
in the parabola 

y = ax 2 + bx + c 

which best fits the set of points 
plotted from the observed data. 

In as simple a case as the pa¬ 
rabola, these normal equations may 
be written out and solved without 
deriving general formulas as was 
done for the straight line. 

16. Example.—Take, for exam pie, 

the following pairs of values of x 
and y. 

x 0 5 10 15 20 25 

10 


Y'S.PO) 


ko,io)^/(io,/o) 

(5,5) 


Fig. 50.—Parabola of closest fit. 


y 10 5 10 20 35 50 
Plotting the corresponding points 

wen^ n!’ ‘'I' 65 ' 1 °“ k ul aS T th0Ugh a P arabola might fit them fairly 
Iol rP T.I abIe LVI1 ' makin K a “lumn for each summa- 
are requ^d 7 n0rma ’ eqUati ° ns ' The seven columns 


198 


A FIRST COURSE IN STATISTICAL METHOD 


Table LVII. Computation for Values of Parameters a , b, and c, in 

the Parabola of Closest Fit 


y = ax 2 + bx + c 


X 

y 

x°- 

X 3 

X 4 

xy 

x-y 

y of 
parab¬ 
ola 

Correc¬ 

tion 

0 

10 

0 



0 

0 

8.57 

-1.43 

5 

5 

25 

125 

625 

25 

125 

7.14 

2.14 

10 

10 

100 



100 

1,000 

10.71 

0.71 

15 

20 

225 

3,375 

HI 

300 

4,500 

19.29 

-0.71 

20 

35 

400 


160,000 

700 

14,000 

32.86 

-2.14 

25 

50 

625 

15,625 

390,625 

1,250 

31,250 

51.43 

1.43 

75 

130 

1,375 

28,125 

611,875 

2,375 

50,875 


0.00 


Write the normal equations: 

1,375a + 75 6 + 6c = 130. 

28,125a + 1,3756 + 75 c = 2,375. 

611,875a + 28,1256 + 1,375c = 50,875. 

Solving these equations for a, 6, and c gives 

a = 0.1000. 

6 = -0.7857. 

c = 8.5714. 

The required equation may now be written: 

y = O.lOx 2 - 0.79a: + 8.57. 

16. Graduation.—Substitute the observed values of x in this 
equation and get the corresponding values of y for this parabola. 
The results are tabulated in next to the last column of Table 
LVII. The necessary corrections to apply to the observed values 
of y to graduate the data to this curve are tabulated in the last 
column. 

17. Resulting Curve.—In Fig. 50 the values of y for the curve 
have been plotted and the curve drawn. It can be seen how 
nearly the parabola fits the observed data. A few of the points 
are not very close to the curve. Perhaps a curve of some other 
type could be found to fit the observed data better. If so, its 
equation would better formulate the law that would seem o 
govern the data. In these illustrative problems, there are not a 











CURVE FITTING 


199 


sufficient number of observed points to make the determination 
of a law very conclusive. 

18. Simplification.—The y- axis may be made to pass through 
the average x as was done for the straight line. When this is 
done and abscissas become deviations from the average, a x — 
12.5, designated by x', 

lx' = 0. 

Also 

lx' 3 = 0, 

provided the x values are proceeding by constant differences as 
in the illustrative table. This simplifies the equations somewhat. 

19. Other Type Forms.—It is not so simple a matter to write 
normal equations for graduation to fit a curve of type 

y — ax n 

or 

y = ke nx . 

Take logarithms of both sides of these equations resulting in 
linear equations. For the first one 

log y — n log x + log a 
Put 

log y = y > 

log X = X, 
log a = b, 

and the equation takes the form 

Y = nX + b, 

which is linear in X and Y. By tabulating the values of log x = 
X and log y = 7, the straight line of closest fit may be obtained, 
by determining the values of n and b such that 

2 (nX + b - y) 2 

shall be a minimum. The slope n, of the line so found, is the 

required n of the equation of the curve. The number of which 6, 

the K-intercept, is the logarithm is the required a of the equation 
of the curve. 

In the same manner, for fitting a curve of type 

y — ke nx , 

by taking logarithms of both sides, the following is obtained: 

log y = nx log e + log k. 



200 


A FIRST COURSE IN STATISTICAL METHOD 


Put 


log y = Y, 
n log e = a, 
log k = b, 


and the equation takes the form 

Y = ax + b. 

This is linear in x and Y. By tabulating values of Y = log y, 
and plotting x with Y , a straight line of closest fit may be found. 
Its slope a, divided by log e = 0.43429, gives the n required for 
the equation of the curve. The number of which the 7-intercept 
b is the logarithm is the k required in the equation of the curve. 

20. Graphical Solution.—With paper ruled with logarithmic 
scale one way and uniform scale the other way, the observed 
values of x and y may be plotted, the straight line be drawn, and 
the values of n and k be determined graphically. This matter 
will be taken up later in the chapter on logarithmic 

representation. 

The curve 


y = ax n 

may be similarly treated on paper having logarithmic scale both 


ways. . ,, 

21. Frequency Distributions—One of the important problems 

in curve fitting is that of graduating a frequency distribution to 

a normal probability curve or some other recognized form whose 

equation is known. Professor Karl Pearson has devised certain 

type forms to fit almost any ordinary skew distribution. 

It was seen that freshman heights (Chap III, P- 86 ) ntte 

fairly close to the symmetric point binomial curve that resulte 

from the expansion of 

04 + M) 10 - 


22 .—It is now desired to graduate the distribution of fresh ™*“ 
heights to a normal probability curve. The equation o 


curve is recalled as being 


2* 1 


y = y oe 

The function y»e<- was given (Chap. III. P 82) -thehmi^ 
nz 14 )" as n increases w 

s at the arithmetic mean. Abscissas . are dev.at.ons from 



CURVE FITTING 


201 


arithmetic mean. In order that the curve may be independent 
of the unit measure used in measuring heights, it is plotted to the 
measure x/<r. The area under the curve between any two 
ordinates yi and y 2 at deviations from the arithmetic mean of X\ 
and x 2 , multiplied by total frequency N , gives the theoretical 
frequency between Xi and x 2 . The resulting theoretical frequency 
distribution is that called for by the laws of probability in a 
sample of N items selected from the universe from which the 
given sample was selected at random, provided the frequency 
distribution in that universe is a normal probability distribution. 
It is assumed that a x is the arithmetic mean, and a x is the 
standard deviation of all the items in that universe. 

23. The Normal Probability Curve.—It will be well to recall 
the shape of this curve. It is a symmetrical, bell-shaped curve. 


y 



Fig. 51.—Normal probability curve. 


The y-axis goes through the high point of the curve. The curve 
is concave downward to the points where x/a = — 1 and x/a = 
+ 1. Beyond these points it is concave upward, getting closer 
and closer to the x-axis, though it never quite reaches it. Its 
form is shown in Fig. 51. 

Tables for values of the argument x/tr have been computed, 
giving ordinates to this curve and areas under the curve from the 
y-axis to the ordinate at x/a. In the tables the total area under 
the curve is taken as 1. This is assuming the total frequency to 
be 1; so that, for any sample of N items, the actual area is N 
times the tabulated area. The ordinate y 0 , at x/a = 0, has been 


202 


A FIRST COURSE IN STATISTICAL METHOD 


found to be 


2 /° = — 

crV 2 f 

so that the equation of the curve takes the form 

N -ll 

7/ =-T=e 2<7*. 

(7 \/27r 

In order to make the measure of ordinates correspond to the x/a 
measure of abscissas, and to give a total area of 1 under the curve, 
the actual ordinates measuring frequencies are divided by N/a 
and z is put for y a- N/a. 

The equation is then of the form 

N 


in which 


V = 


1 

z = — 

V2tt 


The ordinates given in the tables are values of z. 

Table LVIII is extracted from a table of ordinates and areas. 

Table LVIII— Areas and Ordinates of the Normal Probability 

Curve 


Area from the 
y- axis to the 

X 

ordinate at 


0.0 

0.1 

0.2 

0.3 

0.4 

0.5 

0.0 

0.7 

0.8 

0.9 

1.0 

1.1 

1.2 

1.3 

1.4 

1.5 

1.6 

1.7 


0 . 000,00 

0.039,83 

0.079,20 

0.117,91 

0.155,42 

0.191,46 

0.225,75 

0.258,04 

0.288,14 

0.315,94 

0.341,34 

0.364,33 

0.384,93 

0.403,20 

0.419,24 

0.433,19 

0.445,20 

0.455,43 


Ordinate 

X 

z 

<J 

0.3989 

1.8 

0.3970 

1.9 

0.3910 

2.0 

0.3S14 

2.1 

0.3683 

2.2 

0.3521 

2.3 

0.3332 

2.4 

0.3123 

2.5 

0.2S97 

2.6 

0.2661 

2.7 

0.2420 

2.8 

0.2179 

2.9 

0.1942 

3.0 

0.1714 

3.1 

0.1497 

3.2 

0.1295 

3.3 

0.1109 

3.4 

0.0940 

3.5 


Area from the 
y- axis to the 

ordinate at 


Ordinate 

z 


0.464,07 

0.471,28 

0.477,25 

0.482,14 

0.486,10 

0.489,28 

0.491,80 

0.493,79 

0.495,34 

0.496,53 

0.497,44 

0.498,13 

0.498,65 

0.499,03 

0.499,31 

0.499,52 

0.499,66 

0.499,77 


0.0790 
0.0656 
0.0540 
0.0440 
0.0355 
0.0283 
0.0224 
0.0175 
0.0136 
0.0104 
0.0079 
0.0060 
0.0044 
0.0033 
0.0024 
0.0017 
0.0012 
0.0009 



CURVE FITTING 


203 


It may be added that at 

? = 4.00, Area = 0.499,968,3, 2 = 0.000,133,8, 

cr 

5 = 5.00, Area = 0.499,999,71, 2 = 0.000,001,486,7, 

a 

- = 6.00, Area = 0.499,999,99, 2 = 0.000,000,006,1. 

a 

24. Relation between a and p.e .—Incidentally, it may be 
noted that one-fourth the area falls between x/a = 0.6 and x/a — 
0.7. By direct interpolation in this table one-fourth of the area 
lies approximately at 

- = 0.67; 

a 

so half the total frequency lies between x = —0.67 a and x = 
+0.67a, and probable error, or quartile deviation from the 
median, equals 0.67a. 

26. Probability of Deviations Greater than 4a.—At x/a = 4 
there is to the right of the y-axis 0.4999683 of the total area. 
And 0.50000 — 0.4999683 = 0.0000317 of the total area lies 
beyond the ordinate at x/a = 4. This makes it clear that the 
chances of selecting at random an item having a deviation from 
the arithmetic mean of more than four times the standard devia¬ 
tion are very small. The odds against it are 4,999,683:317, or 
about 16,000:1. This means that if a man's height is measured 
and is found to differ from the arithmetic mean of freshman 

heights by more than four times the standard deviation, that is, 
by more than 


4 X 2.51 in. = 10.04 in., 

there is a very high probability that this man does not belong to 
the universe of freshmen from which the sample was taken. The 
arithmetic mean height was found to be 68.18 in. 

68.18 in. + 10.04 in. = 78.22 in., 

68.18 in. - 10.04 in. = 58.14 in. 


By inspection of the table of freshman heights, it is found that 
there was not one of the 629 so tall as 78.22 in. nor so short as 
58.14 in. Moreover, the chances of finding a freshman as tall 


204 


A FIRST COURSE IN STATISTICAL METHOD 


as 78 in. or as short as 58 in. are about 1 in 16,000, if the distribu¬ 
tion of freshman heights, in general, is close to the normal 
probability distribution. 

26. Graduation.—Now return to the problem of graduating the 
heights of the 629 freshmen tabulated in Table XIV to a normal 
frequency distribution. The necessary data and computations 
are tabulated in Table LIX. The first column is class boundaries. 

Table LIX. —Graduation of Freshman Heights to Normal Probability 



<r = 2.51, N = 629. a = 68.18. 


The second column is the corresponding x, or deviation from 
arithmetic mean. The third column is x/a. The fourth column 
is A, the area from the y-axis to the ordinate at r/ff. It is 
obtained from the table of areas. The fifth column is AA, the 
increment of area between successive class boundaries. It is 
the fraction of total frequency in each of the successive classes. 
It is found by subtracting each entry in the A column from the 










CURVE FITTING 


205 


preceding entry until the class is reached in which the arithmetic 
mean lies. In that class A A equals the sum of the areas on each 
side of the arithmetic mean. 

A A = 0.2352G = 0.14431 -f 0.09095. 

Beyond this class A A for any class is the area at the upper 
boundary of the class minus the area at the lower boundary. 

The sixth column is the theoretical frequency for each of the 
successive classes on the basis of total frequency equals N = 629. 
It is obtained by multiplying the entries of the fifth column by 
629. 

The seventh column is the observed frequency of the actual 
sample measured. It is the frequency column of Table XIV. 

The eighth column is the correction to be applied to the 
observed frequency to give the theoretical frequency. It is 
obtained by subtracting the observed frequency from the 
theoretical frequency. 

The curve may be plotted by plotting the theoretical 
frequencies at the midpoints of the classes. 

27. Resulting Curve.—The resulting curve is shown in Fig. 52. 
If it is desired to put in intermediate points, they may be found 



by means of Table III in Davenport’s “Statistical Methods.” 

The frequency polygon of Fig. 31 (page 86, Chap. Ill) is also 

drawn m this figure to show to the eye how closely it fits the 
normal probability curve. 


206 


A FIRST COURSE IN STATISTICAL METHOD 


28. Skewed Curves.—The normal probability curve of best fit 
has been found. If the given distribution is skewed, some other 
curve might fit better than the normal probability curve. The 
problem of fitting curves to skewed distributions is not for a 
beginning course in statistics. 

There may be a certain amount of skewness and yet it may be 
permissible to graduate the data to a normal probability curve. 
If the measure of skewness is less than twice its probable error, 
it is permissible to fit a normal probability curve, provided the 
distribution approximates the normal type. 

The skewness is then not significant. The departure from a 
normal probability distribution may easily be due to the accidents 
of sampling. If it is more than four times its probable error, it is 
not safe to fit a normal probability curve. 

Karl Pearson has derived equations for a set of curves by means 
of which very many distributions may be fitted. 

29. Effect of the Size of <r — Since half the area of the normal 
probability curve is included between x = — 0.67<r and x = 
-f0.67<r, it is evident that a large value of a will make the curve 
spread out rather flat, while a small value of a will make the 
curve narrower and run up steeper, other things being equal. 
The first case shows wide distribution, great variation. The latter 
case shows small variation, great tendency to concentrate about 

the mean. 

30. Kurtosis.—Whether a be large or small, it may be found 
that the curve of a given symmetric distribution may be flatter 
than the normal probability curve, may coincide with it, or may 
be more peaked. This shape characteristic is important and is 
called kurtosis. If the given curve is flatter than the normal 
probability curve, it is said to be platykurtic. If it is more 
peaked than the normal probability curve, it is called leptokurtic. 
The normal probability curve itself is said to be mesokurtic. 

Curve fitting is also effected by what is known as the method of 
moments. This method cannot be taken up here. 



CURVE FITTING 


207 


Table LX. —Maximum Daily Temperature at Seattle, Jan. 1, 1923, to 
Feb. 14, 1923, Inclusive, with Moving Average for Eight-day 

Periods 


Date 1923 

Maximum 

temperature 

Moving 

average 

Date 1923 

Maximum 

temperature 

Moving 

average 



• • 



40 

Jan. 1 

46 

• • 

24 

42 

40 

2 

49 

• • 

25 

41 

40 

3 

50 

• • 

26 

39 

39 

4 

46 

50 

27 

40 

38 

5 

52 

51 

28 

39 

38 

6 

53 

50 

29 

32 

37 

7 

53 

50 

30 

35 

37 

8 

51 

49 

31 

38 

38 

9 

51 

48 

Feb. 1 

37 

as 

10 

47 

47 

2 

39 

39 

11 

44 

47 

3 

39 

40 

12 

41 

47 

4 

42 

41 

13 

43 

47 

5 

43 

41 

14 

47 

46 

6 

39 

42 

15 

50 

46 

7 

42 

41 

16 

17 

53 

46 

8 

47 

39 

48 

45 

9 

40 

18 

37 

42 

44 

10 

40 

19 

36 

42 

43 

11 

35 



20 

40 

42 

12 

29 


21 

36 

41 

13 

20 


22 

40 

41 

14 

32 


23 

45 






. —sometimes a variable changes with 

alternate increases and decreases, in approximately periodic 
waves. In such case a curve of general trend may be fitted by 









208 A FIRST COURSE IN STATISTICAL METHOD 

the method of moving averages. An example will best illustrate 
the method. 

In Table LX are tabulated maximum temperatures from Jan. 
1, 1923, to Feb. 14, 1923, inclusive. 1 The data are plotted to 
scale as shown in Fig. 53 and the points connected by straight 
lines from each to the next. It is noticeable that the graph 
fluctuates up and down in more or less regular waves. There 
are low points at Jan. 4, 12, 21, 29, and Feb. 6 and 13. From 
each to the next gives periods of 8, 9, 8, 8, and 7 days, respectively. 
These periods are approximately equal, with an average of 8 
days. To get the moving average, find the average maximum 
temperature for the first 8 days. Enter this in the moving aver- 



Fiq. 53.—Moving average of maximum daily temperature. 


age column at the middle of this 8-day period, between Jan. 4 
and 5. The next entry in the moving average column is the aver¬ 
age maximum temperature for the 8 days from Jan. 2 to 9 
inclusive. The next is the average for Jan 3 to 10 inclusive. In 
this manner the moving average values are recorded to the last 
entry which is the average maximum temperature for the last 8 
days of the record. A convenient way to get the values after 
the first is as follows: The sum of the first 8 temperatures is 400, 
giving an average of 50. By dropping off 46, the temperature 
for Jan. 1, and adding on 51, the temperature for Jan. 9 , the 
total is increased by 5, making it 405, with an average of 51. 
Then by dropping off 49, the temperature for Jan. 2 and adding 
on 47, the temperature for Jan. 10, the total is diminished by , 
making it 403, with an average of 50. The next change in t e 
total is 44-50, or -6, making the third total 397, with an average 
of 50. The next step diminishes the total by 5, making it , 
with an average of 49. Continue the process to the end. 

1 From “Annual Meteorological Summary of the United States Weather 
Bureau," at Seattle, Wash. 



CURVE FITTING 


209 


32. Moving Average Graph or Line of Trend.—Having com¬ 
pleted the moving average column, plot the values on the tem¬ 
perature graph at the dates to which they belong. Connect the 
points by straight lines from each to the next, or by a smooth 
curve. This curve shows the general trend of maximum tempera¬ 
ture throughout the period except for the first four days and the 
last four days. If the trend line is nearly straight or changing 
steadily, it may usually be extended to cover the entire period. 

33. What the Trend Shows.—A study of the trend line shows 
that there was a general tendency slowly downward through 
the month of January. Then there was a slight upward tendency. 
The daily temperatures were fluctuating back and forth on both 
sides of the trend. About Feb. 1 there was an upward tendency 
for about a week. The rest of the record shows a sharp decline. 
It would hardly be fair to extend this line of trend to the end of 
the period because the sharp decline may be due to a temporary 
cold spell. In fact, the sharp rise in daily temperature from Feb. 
13 to 14 seems to indicate as much. 

34. Short-time Fluctuations.—The deviations of daily maxi¬ 
mum from the trend line may be plotted and thus show the short- 
time fluctuations independent of the long-time changes. If 
these deviations are to be taken from the table, it is more con¬ 
venient if a moving average period with an odd number of days in 
it can be selected so that the entries in the moving average column 
will fall on exact dates. The deviations are plotted from a zero 
base line, above it if the deviations are positive, and below it if 
the deviations are negative. 


Exercises 

1. A rubber band, stretched under a force of p lb., is found to increase in 
length l mm. The following observations were made: 

P . 2 3 4 5 6 7 8 9 10 11 12 

1 . 10 17 21 28 36 43 51 60 70 83 98 

Find, by the method of averaging, the straight line of closest fit. 

2. For the data of Ex. 1, find the straight line of closest fit by the method 
of least squares. 

3. From the same set of data, compute and tabulate adjustments to 
the observed values of l to fit the least squares line of closest fit. 

4. For the same set of data, by the principle of least squares, find the 
parabola of closest fit having the equation l = ap* + bp + c. 

5. Graduate the values of l to fit the parabola obtained in Ex. 4. 




210 


A FIRST COURSE IN STATISTICAL METHOD 


6. For the same set of data in Ex. 1, find the curve of closest fit having 
its equation of the form 

l — ap n . 

7. Plot the data of Ex. 1, and on the same axes plot the resulting curves 
of Ex. 1, 2, 4, and 6. Which of the four seems to give the most satisfactory 
fit? 

8. Graduate the frequencies of freshman weights as shown in Table 
XIII, Chap. II, to a normal probability curve. 

9. Plot the frequency polygon of freshman weights and the normal 
probability curve of Ex. 8 on the same set of axes. Does this curve seem to 
fit the data well? 

10. Graduate the frequencies of heads falling in 500 random throws of 
seven dimes to a normal probability curve (see Table XV, Chap. II). 
Does this curve seem to fit the data well? Does it make a better fit than 
the curve of Ex. 8? 

11. In the following table are the maximum temperatures each day for 
the month of July. Plot the values, determine the average wave length, 
compute a set of moving averages, and plot the results. What was the 
general trend of temperature during the month? 


Maximum Temperatures during July 


Day 

Maximum 

temperature 

Day 

Maximum 

temperature 

1 

74 

17 

84 

2 

80 

18 

88 

3 

85 

19 

81 

4 

71 

20 

75 

5 

68 

21 

81 

6 

79 

22 

83 

7 

80 

23 

85 

8 

85 

24 

80 

9 

81 

25 

74 

10 

76 

26 

72 

11 

74 

27 

73 

12 

80 

28 

78 

13 

89 

29 

81 

14 

75 

30 

82 

15 

16 

78 

80 

31 

83 


12. Graduate the leaf lengths of Table XXIII to a normal probability 
curve. Does the curve appear to fit the data well? 

13. Graduate the leaf breadths of Table XXIV to a normal probability 

curve. Does the curve appear to fit the data well? 






CURVE FITTING 


211 


14. Graduate the leaf lengths of Ex. 19, Chap. II to a normal probability 
curve. Does the curve appear to fit well? 

16. Graduate the leaf breadths of Ex. 20, Chap. II to a normal probability 
curve. Does the curve appear to fit well ? 

16. A weight moving under a certain force was observed at the end of 
each second t to have passed over a distance s from the beginning. The 
results of the observations were: 

< 1 2 3 4 5 6 7 8 

s 0.3 1.1 2.2 3.8 6.4 9.0 12.0 16.o‘ 

Find the straight line of closest fit by the method of least squares. 

17. Find, by the method of least squares, the parabola of closest fit to the 
data of Ex. 16. 

18. Find, by the method of least squares, the curve of the best fit to the 
data of Ex. 16 whose equation is of the form s = at n . 

19. Graduate the observed values of s to fit the parabola of Ex. 17. 

20. Plot the observed data of Ex. 16 and on the same axes plot the curves 
determined in Exs. 16 to 18. Which curve seems to fit the observed data 
best? Does it seem reasonable to assume that the law of motion of the 
weight is determined by the equation of this curve? 


CHAPTER IX 


CORRELATION, REGRESSION 


An important question that arises in statistical study of va¬ 
riables is in regard to any probable relationship in the changes in 
one variable in connection with the changes in another variable. 
If one variable increases, is there a tendency for the other variable 
to increase or vice versa? Or, if one variable changes, does the 
change seem to have no influence on the changes of the other 
variable? Do fathers that are taller than the average of men 
tend, in the long run, to have sons taller than the average? 
If so, to what extent? Of, if the sons tend to revert, or regress, 
to average height, what is the measure of the tendency? Do 
industrial accidents tend to increase or decrease with reduction 
of working hours? Do bond prices tend to fluctuate with call 
rates of interest? Do freshman grades in college have any 
relation to high school grades? 

1. Correlation.—Two variables are said to be correlated if, when 
any value of the first variable be selected, it is found that the 
average of the associated values of the other variable seems to 
depend on the size of the selected value of the first variable. If, 
when the selected value of the first variable is small, the average 
of the associated values of the other variable is small also, increas¬ 
ing with increase in size of the selected value of the first variable, 
the correlation is direct or positive. If, on the other hand, the 
average of the associated values of the second variable is large, 
decreasing with increase in size of the selected value of the first 
variable, correlation is inverse or negative. 

2. Illustration.—To illustrate the meaning of correlation, 
Table LXI is made from Table XXII, page 38, Chap. II. 

The first variable, X, represents leaf breadths, and the other, Y, 
represents leaf lengths. Let 19 be the selected value of X. The 
average value of the associated F’s is 


2 X 22 + 1 X 27 + 1 X 32 


25.75. 


If 22 is the selected X, the average of the associated 

1X 22 + 3X 27 +3X 32 + 2X 37 + 1X 42 _ 

10 


y'sis 

31.5. 


212 



CORRELATION, REGRESSION 


213 


Table LXI. —Scatter Diagram of Lengths and Breadths of CO 

Leaves 


Measured in Millimeters 



interval, is accompanied by an increase of 31.50 mm. - 25.75 

mm. = 5.75 mm. = 1.15 class interval, in the average Y associ¬ 
ated with X. 

The average Y associated with X = 31 is 




/> r 


2 X 32 + 3 X 37 + 3 X 42 + 1 X 47 + 1 X 52 

io " = 40 - £ £ 

0 v *> v r 

Here an increase in X of 12 mm. = 4 class intervals has given ' jf 
an increase m the average Y associated with X of 14.2,?*mm = ^ 0 

2.85 class intervals. 

Similarly, if Y = 27 be selected, the average X associated with , 

!t IS 

I X 19 + 3 X 22 + 3 X 2 5 + 1 X 28 

g = 23.5. 

The average X associated with Y = 42 is 27.5. An increase ir 

of 15 = 3 class intervals, has given an increase in the 

average X associated with Y of 4.0 mm. = 1 H class intervals. 




214 


A FIRST COURSE IN STATISTICAL METHOD 


Inspection of the diagram readily shows that an increase in 
either leaf length or leaf breadth is everywhere accompanied 
with an increase in the average of the other associated with it. 
This means that there is direct correlation between lengths and 
breadths of leaves. This is shown by a glance of the eye, since the 
entries in the cells of the diagram tend to cluster about the 
diagonal that slopes up to the right. 

If the entries^cluster about the diagonal sloping down to the 
right, it show? that with increase of X the average Y associated 
with it decreases, and inverse correlation is indicated. 

If the entries tend to follow neither diagonal nor any other 
definite line but are scattered around indiscriminately, no 
correlation is indicated. 

3. Perfect Correlation and Independence.—If an equational 
relationship can be established between the selected value of the 
first variable and the average of the associated values of the other 
variable, correlation is perfect , positive or negative as the case may 
be. If the average of the associated values of the second variable 
seems to be completely uninfluenced by the size of the selected 
value of the first variable, the two variables are said to be uncor¬ 
related or independent. It is immaterial which variable is called 
the first and which the second. So the same statements would 
apply to selected values of the second variable and the average of 
associated values of the first variable. 

4. Measure.—The numerical measure of the tendency of the 
size of one variable apparently to affect the average size of the 
associated values of the other variable should be of the nature of a 
coefficient or ratio. For perfect direct correlation, its value 
would be +1, and for perfect inverse correlation, — 1. For no 
correlation, it would be zero. One of the most used coefficients 
of correlation is one devised by Karl Pearson and is known as 

Pearson’s coefficient of correlation. 

5. Correlation Developed through Probability— A good idea o 
the meaning of various degrees of correlation and the connec¬ 
tion with probability may be gained by the results of tossing o 
coins, as shown in the following eight scatter diagrams, beginning 

with Table LXII. 



CORRELATION , REGRESSION 


215 


Table LXII.—Number of Heads Falling in Pairs of Throws of Seven 

Dimes, All Seven Being Thrown Each Time 


s 

£ 

^ * 

o 

o 

u 

r* 

u 

o 


-Q 

c 

e 

g 

3 

a 

£ 

CO 


Totals 


0 


1 


6 


Number of heads in first throw 

T fit Q1Q 

0 

1 

2 

3 

4 

5 

6 

7 

JL Uvalo 





1 




1 



3 

3 

4 

2 

2 


14 


2 

8 

10 

10 

6 

1 


37 


4 

■El 

17 

16 

9 

2 


55 


4 

5 

14 

22 

5 

4 


54 


3 

2 

7 

9 

9 

1 


31 



1 

2 

2 

2 



7 





1 



' 

1 

, 0 

13 

26 

53 

65 

33 

10 

0 

200 


Table LXIII.—Number of Heads Falling in Pairs of Throws of 
Seven Dimes. In the Second Throw One Marked Dime Remains 

as It Fell in the First Throw 



Number of heads in first throw 

Totals 


1 

2 

3 

4 

5 

6 

7 

Number of heads in 
second throw 

0 



1 1 





\ 

1 

1 

1 

4 


2 


1 



8 

2 



7 

9 

10 

2 



28 

3 

1 

3 

9 

11 

15 

m 

2 


48 



3 

3 

13 

25 

12 

3 

1 

60 

5 

1 

1 

7 

12 

14 

5 

3 


43 

6 




3 

2 

5 



10 

SB 

\ 



1 


1 



2 

Totals. 

3 

11 

27 

51 ! 

66 

33 

8 

1 

200 
































































































216 


A FIRST COURSE IN STATISTICAL METHOD 

Table LXIV.-Number of Heads Falling in Pairs of Throws o 
Seven Dimes. In the Second Throw Two Marked Dimes Remain 
_ They Fell in the First Throw 


Number of heads in first throw 




1 

2 

3 

4 

5 

6 

7 

Totals 

Number of heads in 
second throw 



1 

1 

1 



l\i 

3 

1 


2 

2 

4 

1 



\ 

9 

2 


1 

11 


5 

1 

2 



3 


3 

13 



3 

4 


48 

4 


3 

7 



10 

1 

1 

48 

5 



6 

12 

8 

10 

6 

1 

43 

6 


1 

1 


7 

8 

1 


18 

7 






1 



1 

Totals. 


11 

41 

56 

43 

33 

14 

2 

200 


rABLE LXV. Number of Heads Falling in Pairs of Throws of Seven 
Dimes. In the Second Throw Three Marked Dimes Remain as 

They Fell in the First Throw 



Number of heads in first throw 

Totals 

0 

1 

2 

3 

4 

5 

6 

7 

Number of heads in 
second throw 

0 


1 

1 





2 

1 


5 

1 

4 

2 

- 

1 

V 


13 

2 


4 

11 

11 

10 

2 



38 

3 

1 

3 

12 

20 

13 

( 

1 


57~~^ 

4 



5 

12 

12 

6 

6 

1 

42 ” 

5 



3 

2 

13 

7 

7 

2 

34 ~ 

G 


\ 


1 

4 

4 

2 

1 

12~~^ 

7 



\ 


1 

1 



~ 2~~I 

Totals. 

1 

13 

33 

50 


28 

16 

4 

200 


























































































CORRELATION, REGRESSION 


217 


:r - Table LXVI.—Number of Heads Falling in Pairs of Throws of Seven 

t Dimes. In the Second Throw Four Marked Dimes Remain as 

They Fell in the First Throw 



Table LXVII. —Number of Heads Falling in Pairs of Throws of 
Seven Dimes. In the Second Throw Five Marked Dimes Remain 

as They Fell in the First Throw 




Number of heads in first throw ; 

Totals 



0 

1 

2 

3 

4 ^ 

5 

G 

7 




3 


\i 




3 

H 

• 

03 

1 
® & 

1 

1 

1 

5 

6 

\ 




13 

2 


3 

17 1 

13 

3 

\J 



36 

o 

b 

3 

\ 

I 

7 

14 

9 

2 



33 

*3 

K» 

O "O 

4 


\ 

1 

19 

25 

7 

2 

\ 

54 

c 

p o 

S o 

5 



\ 

8 

19 

16 

7 



3 <1> 

£ w 

6 




\ 


6 

3 

1 



7 








1 

i 

Totals. 

1 

8 

30 

ah 


31 

12 

/’X 

200 


ou 

00 

2 






























































































218 


A FIRST COURSE IN STATISTICAL METHOD 


Table LXVIII. —Number of Heads Falling in Pairs of Throws of 
Seven Dimes. In the Second Throw Six Marked Dimes Remain 

as They Fell in the First Throw 



Number of heads in first throw 

Totals 

0 

1 

2 

3 

4 

5 

6 

7 

Number of heads in 
second throw 

0 

1 







1 

1 


4 

6 

\ 





10 

2 


1 

11 

12 

\ 




24 

3 



13 

29 

13 




55 

4 



\ 

15 

32 

19 

\ 


66 

5 




\ 

11 

16 

6 

\ 

33 

6 





\ 


8 

2 

10 

7 







1 


1 

Totals. 


5 

30 

i 

56 

56 

35 

15 

2 

200 



Table LXIX.— Number of Heads Falling in Pairs of Throws of Seven 
Dimes. In the Second Throw All Seven Dimes Remain as They 

Fell in the First Throw 


Number of heads in first throw 


Totals 



To obtain Table LXII seven dimes were shaken and tossed on a 

table, and the number of heads noted. r lhen th ® “' < ! n 
were reshaken and tossed again and the number of tods noted. 
If, for example, there were three heads on the: first toss an 
heads on the second toss, a score mark was made m the: ce ‘ m 
column headed 3 and in row headed o. This operation of taking 













CORRELA TION, REGRESSION 


219 


pairs of tosses of the seven dimes was repeated till 200 pairs of 
tosses were made and the score marks all entered in the diagram. 
The table was completed by replacing the score marks in each cell 
by the number of such marks. 

It is evident that the number of heads falling on any single first 
toss could have no influence whatever on the number of heads 
falling on the second toss. In other words, the two variables, 
number of heads on the first toss, and number of heads on second 
toss, are completely independent and there is no correlation. 

Now see what influence the size of X , the number of heads on 
first throw, has on Y, the average number of heads on second 
throw, associated with A”, 

X 0 1 2 3 4 5 (> 7 

Y 0 3.(5" 2.9 3.3 3.4 3.(5 3.1 0 

The change in Y seems to have nothing to do with the value of 
X, except that, at the extremes, } r seems fairly constant. It is 
likely that the extremes are accidents of the sample. 

The entries in the diagram seem to cluster about the center of 
the diagram without spreading out in one direction more than 
another. 

Table LXIII was obtained in a similar manner, except that one 

dime was blackened and left on the table so that only the other 

six were actually thrown on the second toss of each pair of tosses. 

If this black dime fell heads on the first toss, it remained heads on 

the second toss. 4 hen at least one head on the second toss was 

certain. If the black dime fell tails on the first toss, it remained 

tails on the second toss. Then not more than six heads were 

possible on the second toss. Two hundred pairs of tosses were 

made, and the score marks entered and recorded as was done for 
Table LXII. 

It was not possible to have seven heads the first toss and no 
heads the second toss, for if all seven dimes fell heads the first toss, 
the black dime fell heads and remained heads on the second toss| 
assuring at least one head on the second toss. Neither was it 
possible to have no heads the first toss, and seven heads the 
second toss. Diagonal lines are drawn through the ceils in which 
no entry was possible to occur. 

. J he number of heads falling in any first toss now has a slight 
influence on the number of heads falling on the corresponding 
second toss, for, whichever way the black dime falls and is 



220 A FIRST COURSE IN STATISTICAL METHOD 

recorded for the first toss, it is recorded the same way for the 
second toss. 

Working out the values of Y for each value of X, as was done 
before, the result is: 

X 0 1234567 

Y 3.0 2.7 3.3 3.7 3.7 4.1 4.1 4.0 

It is discovered that there is a slight tendency such that the 
larger the number of heads on the first toss the larger will be the 
number of heads on the second toss. In other words, some correlar 
tion is indicated. 

Table LXIV was obtained in exactly the same manner except 
that two dimes were blackened and left on the table after each 
first toss so that the two remained unchanged on the correspond¬ 
ing second toss. 

Now if all seven dimes fell heads on the first toss, there would 
have to be at least two heads on the second toss, for the two black 
dimes would be both heads. If six dimes fell heads on the first 
toss, there would have to be at least one head on the second toss, 
for at least one of the black dimes would be heads. Similarly, if 
there were no heads on the first toss, there could not be more 
than five heads on the second toss, and if there were one head on 
the first toss, there could not be more than six on the second toss. 
Diagonal lines are drawn to cut off the cells in which entries are 

impossible to occur. __ 

Determination of values of Y for each X gives: 

X0 1234567 
Y ~0 2.8 3.1 3.3 3.9 4.7 4.0 4.5 

In only one case does Y decrease for increase of X, though for 
X = 6 and X = 7, Y is less than for X = 5. 

Table LXV is obtained in exactly the same way except that 
three dimes are blackened and left on the table for each second 
toss. The diagonal lines cutting off the cells in which entries are 
impossible cut off more cells than before, crowding the entries 
closer to the diagonal that slopes down to the right. The increased 
tendency to follow this diagonal indicates stronger correlation. 
(Why not the diagonal up to the right? Some statisticians follow 
this method of tabulating the variables and some the method 

used in Table LXI.) 



CORRELATION, REGRESSION 221 

The values of Y corresponding to each X are: 

X01234567 

7 3 1.7 2.8 3.0 3.7 4.1 4.6 5.0 

Except for X = 0, Y = 3, the values of Y are running along 
with the corresponding values of X more closely than in the pre¬ 
vious tables. The exceptional value of Y for X = 0 is due to a 
single pair of tosses and is undoubtedly an accident of the sample. 

In making Table LXVI four dimes were blackened and remained 
unchanged for the second toss of each pair. The diagonal lines 
are drawn still closer to the main diagonal of the table. The 
entries cluster still more closely about this main diagonal. The 
values of X and Y follow: 

£_0 1 2 3 4 5 6 7 

7 1.7 2.2 2.6 3.4 3.9 4.2 5.4 6.0 

In making Table LXVII five dimes were blackened, remaining 
unchanged in each second toss. The entries crowd closer to the 
main diagonal. The values of X and Y are: 

£_0 1 2_3 4 5 6 7 

Y 1.0 1.2 2.1 3.2 4.1 4.8 5.1 6.5 

Now for nearly every value of X the value of Y is quite close 
to the value of X. 

For Table LXVI 11 six dimes were blackened and remained 
unchanged for each second toss. All entries are now very close 
to the main diagonal down to the right. The corresponding 
values of X and Y are: 

*01234567 

Y 0 1.2 2.2 3.1 4.0 4.5 5.7 6.0 

In getting Table LXIX all the dimes of the first toss were left 
on the table and counted as they were for the second toss. The 
entries are all confined to the main diagonal down to the right. 

As soon as the value of X is known, the value of 7 is known to 
be the same. Everywhere 

Y = X. 

There is an equational relationship and perfect correlation. 

This surveyhas now gone through these eight tables recording 
the values of Y , the average number of heads in second toss corre¬ 
sponding to X , the number of heads in first toss. It is found 



222 


A FIRST COURSE IN STATISTICAL METHOD 


that, by making the tosses in such a manner that the number of 
heads, X, in the first toss has more and more influence in deter¬ 
mining how many heads shall fall on the second toss, Y has a 
stronger and stronger tendency to become equal to its correspond- 
ing X . This means that correlation is becoming greater. It is 
also noticed in the diagrams that the entries have a growing tend¬ 
ency to follow a diagonal of the figure as correlation increases. 
If for each table there had been 2,000 pairs of tosses of dimes 
instead of 200, these tendencies would undoubtedly have been 
more marked. 

In the first table there was very little correlation, if any. 
From the fact that the number of heads in the second toss was 
absolutely independent of the number in the first toss, it must 
be concluded that, if any correlation is indicated, it is an accident 
of the sample. 

One might just as well have associated the average number of 
heads on the first toss that occurred with each possible number on 
the second toss that is, have found X for each Y. The results 
would have been similar. Increasing correlation would have 
been observed as progress was made through the successive 
tables, ending with the perfect correlation of Table LXIX. 

6. Regression Lines and Equations.—An expression for a 
measure of correlation may bo obtained by first deriving the equa¬ 
tions of what are known as regression lines or characteristic lines. 

Turning again to Table LXI, calculate the average Y associated 
with each X. This means—calculate the average length of all 
leaves having each given breadth. In other words, find the 
average of each of the columns. Let Y stand for this mean of each 
column. Take a pair of coordinate axes and plot the points 
(X, F), one for each column of the table. Find the straight line 
of closest fit by the method of least squares. Take the y-axis 
through a x , the arithmetic mean of breadths X, and the z-axis 
through a Y , the arithmetic mean of lengths. Use x for deviations 
from a X) and y for deviations from a Y . For any value of x, the 
items in the corresponding column give an array of values of y- 
This array is called the y-array of type x. Let n be the number of 
items in any y-array, n' being the number in the first y-array, 
n" the number in the second y-array, and so on to n (r) , the number 
in the rth y-array, there being r such arrays in all. Let the equa¬ 
tion of the required line of closest fit be 

y = h\X -f- dj. 



CORRELA TION, REGRESSION 


223 


It is required to make the coefficients di and b i such that 

Ay - (&i* + a 0] 2 

shall be a minimum. 

Let RR' (Fig. 54) be such a line. For any given x = 0A h 

biX -j- czi = A\C\ 

and 

y = AiBi. 

y - (bix + ai) = CiBi 

is the distance the arithmetic mean of the ?/-array of type x is 
from the line RR'. The sum of the squares of these distances, 
one for each of the N pairs of items in the entire table, must be a 

S 



minimum. In Chap. VIII, page 192, it was determined that 
a i anc * b i must satisfy the normal equations: 

A(t>& + ai) - y] = 0 

and 

2x[(&i£ -f- di) — y] — 0. 

, an V~ a /^ y ° f type *’ containin g n items, contributes n 
terms to each of the above summations. The sum of y in the *- 

array will then be ny, and the sum of y for the entire table will be 

^LT d l eaCh VF"* from ^umn to column. The sum of 
xy for the entire table will be 2x(ny). 


224 


A FIRST COURSE IN STATISTICAL METHOD 


Expanding and transposing, the normal equations now become 

bi2x -f- Na i = 2 ny, 
bx'Lx 1 + ai'Zx = 2x(ny). 

In any y-array, 

ny = 2 y 

for the n items in that array. 

Then for the entire table 


2 ny = 2 y. 

Similarly, 

2 x(ny) = 2 xy 

for the entire table. 

The normal equations may now be written: 

6i2x + Nch = 2 y, 

and 

6i2x 2 + ai2x = 2 xy. 

But since x is deviation from a x and y is deviation from a Y} 


Therefore 


and 


Also, 


2z = 0 = 2 y. 
Na i = 0 


<zi = 0. 


6i2x 2 = 2 xy 


and 


hi = 


2 xy 
2x 2 


Now introduce what is known as the product moment , 

Zxy 


V = 


N 


and a quantity r such that 


r = 


V 

<Tx(T y 


Then, 


so that 


7 Np rax<ry 

b ‘ ~ Nap ~ ^7” 


7 (Ty 

bi = r — 
ax 


The equation of the line of closest fit is, therefore j 



CORRELA TION, REGRESSION 


225 


This is the regression equation of y on x. The line is the 
regression line of y on x. 

k ffy 
0i= r - 

<r.v 

is called the regression coefficient of y on x. Applied in Table LXI, 
it would be the regression coefficient of leaf length on leaf breadth. 

7. Meaning of Regression Line.—This regression line is a line 
such that the sum of the squares of the distances of the arithmetic 
means of the columns, from the regression line, each mean counted 
as many times as there are items in the column, is the least 
possible. If some other line were drawn, the sum of the squares 
of the distances from it would be more than when this regression 
line is used. 

The equation of the line of regression may be regarded as an 
equation from which to estimate the average y associated with 
each z in such a manner as to make the sum of the squares of the 
errors of estimate a minimum. Or, it may be considered as that 
equation from which y, the average of all the y' s associated with 
any value of x, may be estimated so that the sum of the squares of 
the errors of estimate shall be a minimum, when each mean, y, is 
weighted by the number of items in the y-array determining it. 
It is the line which best fits the points determined by the averages 
of the columns in the scatter diagram. 

8. Another Regression Line.—In the same manner may be 

found the line of best fit to the points determined by the averages 

of the rows in the scatter diagram. Its equation would be the 
form 


x = b 2 y + a 2 . 

Let 55'(Fig. 54) be such a line. For any x-array of type y = 

UA 2 let X - A 2 B 2 be the average x in that array, and A 2 C 2 = 
b 2 y + a 2 . 

Determine b 2 and a 2 so that 

2[x — (b 2 y + a 2 )] 2 

a “ Um ‘ / little consideration of the symmetry of 
gure show that the required regression equation is the 
qua ion of RR with x and y everywhere interchanged. Inter- 

equatTon of^' / ^ ^ ^ ^ the 



<y Y 





226 


A FIRST COURSE IN STATISTICAL METHOD 


and 



ox 
r — 

Oy 


This equation is the regression equation of x on y, and b 2 is 
the regression coefficient of x on y. The equation is the equation 
for estimating x, the average x in the x-array associated with any y, 
in such a manner as to make the sum of the squares of the errors of 
estimate a minimum, each x being counted as many times as there 
are items in the x-array to which it belongs. 

The slope of RR' (Fig. 54) equals A 1 C 1 /OA 1 . 

So 



A X C X 

OAi ' 


Similarly, 

ax A 2 C 2 

62 " r a, ~ OAt' 

9. Regression Coefficients are Constants.—The two standard 
deviations, a x and a y , are statistical constants which can be 
computed from the frequency tables. The quantity r = p/oxOy, 
where p = 2xy/N , may also be computed from the scatter dia¬ 
gram, or correlation table. Hence b\ and b 2 are statistical con¬ 
stants which may be at once computed from the correlation table. 
With 61 and b 2 known, the two regression equations may be 


written at once. 

10. Averages Exactly on Straight Lines.—If the averages of the 
columns happen to fall exactly on a straight line RR then, upon 
selecting any x determining a y-array, the average of the items in 

that array is known at once from 


the equation of RR'. If the line slopes up to the right, as in Fig. 
54 then the larger the x selected the larger the average y 
ciated with it. This would indicate direct correlation. Simi¬ 
larly, if the averages of the rows happen to fall exactly on 
a straight line SS', then the average x in any x-array of type y 
may be found from the regression equation 


Ox 

x = r- • y. 
oy 


If the line slopes as does SS' in Fig. 54, then the larger the y 
selected the larger is the average of the x « associated with . 
This would indicate direct correlation. 


CORRELA TION, REGRESSION 


227 


11. Averages Not on Straight Lines.—It seldom happens that 
either set of averages falls exactly on a straight line. However, 
they frequently fall sufficiently close to straight lines so that the 
deviations from the regression line may be assumed as errors of 
sampling. If they are so scattered that this assumption is not 
true, then the regression line, the straight line of closest fit, has no 
significance. In such case, it may happen that the averages 
tend to follow some other mathematical curve and so still show 
correlation. If the averages do tend to follow a straight line, the 
regression is called linear regression. If the averages tend to 
follow some other curve, there is said to be non-linear regression , 
and the correlation is called skew correlation. 

12. Coincidence of Regression Lines.—If the averages of the 
columns and the averages of the rows fall on the same straight 
line, there is equational relationship between the two sets of 
averages indicating perfect correlation. In that case, the lines 
RR' and SS' coincide. 

13. Regression Line Falling along a Coordinate Axis—If the 
averages of the columns fall all on the 2 -axis, R1V coincides with 
the x-axis. The size of any selected x would then have no influ¬ 
ence on the average of the associated y-array. Similarly, if SS' 
coincides with the y-axis, the size of any selected y has no influence 
on the average of the associated 2 -array. In either case, there is 
no connection between the size of one variable and the asso¬ 
ciated sizes of the other variable. In other words, there is no 
correlation. 


14. Regression Lines and Correlation.—Evidently, the nearer 
RR' and SS' come to coincidence, the greater the degree of corre¬ 
lation between the two variables. The nearer RR' approaches 
the 2 -axis or SS' approaches the y-axis, the slighter the degree of 

correlation. 

15. The Product b x b 2 as a Measure of Correlation.—If RR' 
and SS' coincide, 


6i = I. 
bo 


This is seen at once from Fig. 54. If OC, and OC 2 are taken 

equal and either line (or both) is rotated to coincide with the 
other, Ci and C 2 will coincide and then, 

OA x = A 2 C 2 


and 


AiCx = OA 2 . 



228 


A FIRST COURSE IX STATISTICAL METHOD 


Then, 

AiCi ... , OA2 

M7 Wl11 equal Za 

But 

A\C\ _ ,_j Oi4 2 1 

771 — — 0 i and - 7 — 77 - = IT' 
(jA\ A2C2 O2 

Therefore, 



in this case. 

Then, 

6162 = 1. 

If RR' coincides with the a>axis, 

&i = 0. 

If SS' coincides with the y- axis, 

b 2 = 0 . 

In either case, 

b\b 2 = 0 . 

The farther RR' and SS' recede from the a>axis and the y- 
axis, respectively, and the nearer they approach coincidence, 
the greater the degree of correlation, and the nearer does b\b 2 = 1 . 

The quantity b^b 2 then makes a good measure of correlation in 
the case of linear regression, since, when it becomes 0 , there is no 
correlation, and the nearer it gets to 1 , the greater the degree of 
correlation, showing perfect correlation when it equals 1 . 

If, when RR' and SS' coincide, the line slopes up to the right, 
the correlation is direct or positive. If it slopes down to the 
right, correlation is inverse or negative. 

16. The Quantity r as a Measure of Correlation.—Now 



ay C.Y 

r — • r — 

(Tx o y 


Therefore, 

and 


r 2 = bib 2 
r = \/b\b 2 . 


It follows that r becomes an excellent measure of correlation 
in the case of linear regression. If the correlation is direct, the 
plus sign is used with r. If correlation is inverse, the minus sign 

is used. 


CORRELATION , REGRESSION 


220 


It is necessary that b\ and b 2 have the same sign, or r would 
be an imaginary quantity. In Fig. 54, 6* and b 2 are both 
positive. If RR' and SS' sloped down to the right, both b i and 
b 2 would be negative. In either case, b x b 2 is positive and r is 
real; r is the geometric mean of &i and b 2 . 

17. Formula for r.—Going hack to the original definition of 
r on page 224: 



Oxo y 


and 


That is, 



This is the form in which it is usually written. It is known as 
Pearson's coefficient of correlation. It is applicable to distributions 
having approximate linear regression. In case of skew correla¬ 
tion, some other measure of correlation must be used. Trans¬ 
lating the formula for r into English, it reads: The coefficient of 
correlation equals the product moment of the associated variables 
divided by the product of their standard deviations. 

Substituting lx 2 /N for <r. v 2 and 2y 2 /N for o Y 2 in 


r = 


gives 


r =- 


or 


r = 


2 xy 

NoxO y 


Xxy 


■ 

2z/ 2 

N 

2xy 

/—--— 

—~ • 




If the standard deviations of the variables are not needed, this 
ormula gives a more direct means of computing r. 

18. Short-cut Formula— Let the short-cut methods of comput¬ 
ing standard deviation and average be recalled. For a Xf the aver¬ 
age o X , an arbitrary origin X' was chosen and it was found that 

. 2d X ' 



230 


A FIRST COURSE IN STATISTICAL METHOD 


the arbitrary value chosen plus the average deviation from the 
arbitrary value chosen. 

It was also found that 



From the short-cut equation for a x 



2d x ’ 

N 


Let x' = d x ' = X — X' and y' = d Y > = Y — Y'. 
Now the short-cut equations may be written 

a x = X ' + _ 

/ 2 X /2 _,2 

** = yj-jr ~ x * 

Similarly, 

ay=Y' + y' 
and 



It is now desired to find an expression for product moment, 
Zxy/N, in terms of x' and if. 

x' = X - X' 


= (x + a x ) - (a x - x') 

= X + x'. 

This may be seen graphically by the following diagram, in 
which 0 is the origin of measurements: 



OX' = X', OM = a Xt OX = any X. 
X'M = a x - X' = x 
MX = X - a x = x, 

X'X = X - X' = x 


Now directly from the figure: 

x' = x + x\ 


true for every X. 








CORRELA TION, REGRESSION 


231 


Similarly, 

Then, 


So 


But 


y' = y + y’. 

x’y' = (x + x') (y + y') 

= xy -f x'y + y'x + x'y'. 

2 x’y' = Zxy + x'Zy + y'Zx + Nx'y'. 


Zx = 0 = 2 y. 

since x and y are deviations from the respective arithmetic means. 
Therefore, 

Zx'y' = 2 xy + Nx'y' 
and 

2zy = 2 x'y' - Nx'y'. 

The formula for r now becomes: 

r = 

Na x a Y 

19. Arbitrary Origin at Zero.—As a special case, take the arbi¬ 
trary numbers X' and 7' at the origin of measurements, that is: 

X’ « Y' = 0. 

Then, 

= X, y’ = 7, 

2X , 27 

= - Ar = £ r 


ax 


-4 


And 


2X 2 - Xa* 2 
N 

Zxy = 2X7 - Na x a v . 


a Y = ^ 


27 2 - Na v 2 


N 


r = 


2X7 — Na x a 


n 4 


N 


2X 2 - Na x 2 IzY 2 - Nay 2 




or 


r = 


2X7 - Na x a r 


V(2X 2 - iW)(27 2 _ Xa7)‘ 

This gives r m terms of the original measurements X and 7 
and their arithmetic means. ’ 

0t a CorreIation Table.—Now take Table 

aflT f ° r g6tting the anthmetic means and standard 

deviations of the two variables, length and breadth, and the coef- 


232 A FIRST COURSE IN STATISTICAL METHOD 

ficient of correlation between these two variables. This forms 
Table LXX. 

The arbitrary origin of measurement should be near the middle 
of the distribution. 

Let 

X' = 25 mm. 

and 

Y' = 37 mm. 

The classes having these midpoint values are ruled off by heavy 
lines, for convenience. Let the class interval be the unit of 
measurement. 

X' = 2 % = 8}4 class intervals. 

Y' ~ 37 A ~ 7-4 class intervals. 

Then deviations, x' and y' from X' and Y', are multiples of 
class intervals. Columns are made for y', fy', fy ,2 , fx'y', and 
f{y' + l) 2 . Additional rows are made for x', fx,' fx' 2 , fx'y', and 

fix' + l) 2 . 

In the y f column, begin with 0 at the arbitrary origin and write 
1, 2, etc. upward to the top of the table. Write —1, —2, etc. 
downward to the bottom of the table, the class interval being 
the unit of measurement. Column fy' is simply the product of/ 
by y r ; column fy' 2 , the product of y' and fy'. The rows a;', fx , 

fx' 2 are obtained in an exactly similar manner. Column fx'y is 
obtained as follows: In the lower right-hand corner of the 
occupied cells are written small figures. Each of these is the 
x'y' product for each item in the cell. For each cell in column 
headed 25, s' = 0. Then x'y' = 0 for each cell in that column. 
For each cell in column headed 31, x' = 2. For each cell in 
column headed 19, z' = -2. For each cell in any column x 
equals the x' entry for that column. Similarly, the y' for each 
cell in any row is the y' entry for that row. Then the xy e^ry 
for any cell is the product for the x' entry for the column in which 
the cell stands, by the y' entry for the row in which it stands. 
For column headed 28, x' = 1. So the small figures are each one 
times the corresponding y'. For column headed 31, x 
So the small figures are each two times the corresponding y- 
That is, they start at 0 and go up and down by multiples of 2 an 
-2, respectively. For column headed 34, the small figure 
proceed from 0 up and down by multiples of +3 and -3, respec¬ 
tively. Similar formation of the x'y' products for every cell 



CORRELATION , REGRESSION 


233 


the table follows readily. If much correlation work is to be done, 
it may well pay to have printed sheets ruled and the proper 
x'y' entry printed in each cell. 

Having completed the proper small figure entries in the cells 
and having recorded frequencies, at once obtain the fx'y' entries. 
In each cell, of course, there is an x'y' for each item that falls in 
that cell. So multiply each cell frequency in a row by the x'y' 
for the cell and sum across the entire row. Thus, for the first 
row, 

fx'y' = 1 X 6 + 1 X 9 = 15. 

For the second row, 

fx'y' = 2X0 + 3X2 + 1X4 + 1X0 = 16. 

For the third row, 

fx'y' = 1 X (-1) + 3X0 + 5X1 + 3X2 = 10. 

The fourth row gives 0. 

For the fifth row, 

fx'y' = 1X2 + 3X1+4X0 + 3X (-1) + 2 X (-2) = -2. 

Continue the process for all rows, entering the sums in the 
fx'y' column. Be sure to observe algebraic signs. A similar 
process applied to the columns gives the row headed fx'y'. This 
row is merely inserted as a check, fx'y' should be the same both 
ways. The f(y' + l) 2 column is self-explanatory. For any 
row, add 1 to y\ square, and multiply by/. For the first row, 

(3 + l) 2 X 2 = 32. 

The row f{x' + l) 2 is obtained similarly. Find the algebraic 
sum of every column and row except column y' and row x'. 

21. Charlier Check.—The column f(y’ + 1)* is the Charlier 
check on the other work. 

M + l) 2 = fy’ 2 + 2 fy’ + /, 

So 

W + l) 2 = 2 fy’ 2 + 22 fy' + N. 

In this table, 

W + l) 2 = 181 

W* + 22 fy' + N = 139 + 2( — 9) + 60 = 181, 
which checks. 

Similarly, for/(x' + l) 2 

211 = 109 + 2 X 21 + 60. 



Table LXX. —Correlation Table for Lengths and Breadths of 60 Leaves X = Breadths, Y = Lengths, in 

Millimeters 


234 


A FIRST COURSE IN STATISTICAL METHOD 






















































































CORRELATION , REGRESSION 


235 


22. Computation of the Constants—Now obtain the constants 
used in the formulas. 

Xxf 21 „ _ 


x' — 


f “ 60 = °' 35 ‘ 


Oi = X’ + x' = 8.33 + 0.35 = 8.68. 

- Jw - (o - 35)2=i - 3 °- 

*'“f' = W = 

a K = 7' + y' = 7.40 + (-0.15) = 7.25, 

'r = Vfr - r - aJw - ( -°- 15)2 - L51 - 

All these results are in class intervals as units of measurement. 
To get the results in millimeters, the A-results, breadths, 
must be multiplied by 3, the X class interval, and the 7-results, 
lengths, by 5, the 7 class interval. This gives 

a x = 8.68 X 3 = 26.04 mm. 
a Y = 7.25 X 5 = 36.25 mm. 
cr x — 1-30 X 3 = 3.90 mm. 

<t y = 1.51 X 5 = 7.55 mm. 

23. Value of r.—For Pearson’s coefficient of correlation, 
f __ 2*V - Nx'y' _ 71 - 60(0.35)(—0.15) 

N<t X (Ty 60 X 1.30 X 1.51 

Whence, 

r = 0.63. 

The formula for probable error of r has been given as 

p.e. of r = •*>. 

Vn 

In this problem 

p , of r = 0, 6745(1 - 0.397) = 

V60 

So the coefficient of correlation is written 


r = 0.63 ± 0.05. 

The chances are even that the true value of r differs from 0.63 
by not more than 0.05. 

^In the formula for r, it is not necessary to multiply x' and 
y by the respective class intervals. If this were done, both 
numerator and denominator would be multiplied by both class 
intervals, and the value of the fraction would not be changed. 



236 


A FIRST COURSE IN STATISTICAL METHOD 


In this problem, the numerator and the denominator would each 
be multiplied by 3 X 5 = 15. 

Other measures of the degree of correlation have been devised. 
Pearson’s coefficient, the product moment method, is based on a 
standard deviation about the line of regression, since it involves 
the sum of the squares of the deviations of the row (and column) 
averages from the line of regression. 

24. Correlation Ratio.—If there is skew correlation, that is, 
if the row (or column) averages do not tend to follow a straight 
line, but some other curve, a method would be based on an aver¬ 
age standard deviation with respect to such curve. 

Let s ax be the standard deviation of any z-array consisting of 
n items. Let o 2 ax be a weighted mean of all the s 2 ax ’s, the weights 
being the corresponding values of n. Then, 

, 2ns 2 ax 

rr L _ _ =-- 


This (Tax is based on a line going through the averages of the 
rows, since every s ax is a standard deviation from the average of 
the corresponding row. It can be seen that this weighted mean 
of standard deviations of row arrays will be less than the standard 
deviation of the entire set of N observations. 1 

Then write 


Solving for, ij 2 xy, 



<r 2 ax = o\* 2 (l — v 2 xy)‘ 
(eta square, I on F), 

2 _ ***"“ **«« . 


V 2 xy 



Thus, t f X Y is the ratio of the reduction of the square of standar 
deviation to the square of the standard deviation itself when 
the weighted average square standard deviation of rows is 
used. Professor Pearson calls v the correlation ratio. It may 

also be shown 2 that 


<T X 


2 _ 


o'ax 


= O 


mx 


where is the weighted standard deviation of the means of the 
z-arrays from a x • Therefore, write 


V 2 xy = 


2 

O mx 

A 

<TX 


> This may be seen in an example by inspection of the table in whicl. the 
standard deviation of each row (or eolumn) is less than that of 
table. The mathematical proof is not given here. 

= G. Udny Yule, “Theory of Statistics,” p. 205 (4). 


CORRELA T10N, REGRESSION 


237 


It is evident that if one starts with, say, standard deviation of 
any y-array, another value of tj designated by Tjy X will result. 
The same line of reasoning will give 


G n iy 

Vyx — — • 

<Ty 


26. Example.—-In Table LXX are placed additional columns 
for determining a mxy and additional rows for determining a mv . 
Having these determined, the correlation ratio of X on Y, 
t}xy , and of Y on X, n Y x, may be at once computed. 

In Table LXX, the column heading x' ax is the deviation of the 
average of an x-array, measured in class intervals, from the 
arbitrary origin. 

For the first x-array, 


= L X8 + LX3 = 2 . 50 . 


For the second x-array 

2X0+3X1+1X2+1X3 


X ax 


= 1.14. 


For the third x-array, 

^ _ 1 X (-1) + 3X0 + 5X1+3X2 

s a * - 


= 0.83 


And so on through all the x-arrays. The next two columns 
are self-explanatory. 

The row heading y' ay is the deviation of the average of a y- 
array, measured in class intervals, from the arbitrary origin. 

For the first y-array, 

= — { ~ 3) = -3.00. 

For the fourth y-array, 

. = 2X2+3X14-5 X 0+4 X (-1) 4-3 X (-2) 
aV 17 ' ~ 

= -0.18. 

And so on for each y-array. The next two rows are self- 
explanatory. 



238 


A FIRST COURSE IN STATISTICAL METHOD 


Let x be the deviation of the average of an rc-array from the 
average of the entire sample. From the following diagram, it 
is seen that 

x' ax = x + x’. 

0 X' M A 

i-1-:-1 

*— x '—— x —> 

* . £ ax . * 

X' = OX' = arbitrary value, a x = OM = average of all X’s, 
OA = average of any rc-array. The deviation of a x from X' = 
OM — OX' = x', previously computed. OA — OM = x. 0A- 
OX' = x'az- And x'ax = x + x'. 

Then, in any one row of n items, 

nx'\ z = n(x + x'Y 

= nx 2 + 2 nxx' + nx' 2 . 

Then, for the entire table, 

2 nx ,2 ax = 2n^ 2 + 2x'Znx -f x' 2 Xn 
= 2 nx 2 + Nx' 2 - 

Since 2na; equals summation for all rows of the total deviations 
from a x of all items in each row, and so is the sum of the devia¬ 
tions from a x for all N items, 2 nx = 0. 

From the above equation, 

2 nx 2 _ 2 nx' 2 ax ~ Nx ' 2 

N N 

2 nx 2 is the sum of the weighted square deviations of the aver¬ 
ages of the rows from a x . Therefore, by definition, 

_ Xnx 2 _ 2 nx' 2 nx _ - f * 

a "* ~ W X ' 

Similarly, 

_ 2 ny 2 _ 2 ny' 2 av _ 

a mv N N V * 

Now from the last column of Table LXX and the previously 
computed x', 

^ = 5 |f - (0.35) 2 

= 0.9203 - 0.1225 
= 0.7978. 

a mz = V0/7978 = 0.89. 

Then, 

_ = ° mx = ?!?? = 0.68. 







CORRELATION , REGRESSION 


239 


In the same manner, 


my 


/GO.22 

“ V 60 


(-0.15) 2 


= 0.98. 


And 


Vyx = 


0.98 

1.51 


= 0.65. 


The/ of the table is the same as the n of the formulas. 

The quantity tj X y is the correlation ratio of A r on Y, rj yx the 
correlation ratio of Y on X. 

26. Degree of Departure from Linear Regression.—In every 
case t] is greater than the numerical value of r unless there is 
strict linear regression giving r = ±1, when y = r. The differ¬ 
ence y 1 — r 2 gives a measure of the degree of departure from 
linear regression. An approximate value of the probable error of 
t? 2 — r 2 is given 1 as 

2 (.6745) _ 0490 

If y 1 — r 2 is more than 2.5 times its probable error, it quite 
likely indicates a significant departure from linear regression. 
In other words, if 

0490'\/’’ 2 _ r2 < Z5 ' 
regression may be regarded as linear. 

Applying this test to 

y xy = 0 . 68 , 

rS V(0.68) 2 - (0.63)4 „ ^V0.4624 - 0.3909 = 

_7.75\/6.0655 _ 7.75 X 0.26 

1.3490 1.3490 1,5 < 2 

For t] YX = 0.65, the result, 0.92 <2.5, is obtained. This means 
that with the given correlation table it is permissible to regard 
regression as linear. 

In the case of strict linear regression with RR' coinciding with 

SS', 

V 2 xr = v*yx= r 2 = 1. 

Even with strict linear regression, fluctuations due to sampling 
will likely make y 2 differ slightly from r 2 . 

1 Truman L. Kelley, “Statistical Method.” p 239. 



240 


A FIRST COURSE IN STATISTICAL METHOD 


27. Correlation when r = 0.—In case the curve of regression is 
a periodic wave curve, r may be very small or zero, though there 
is perfect correlation. The correlation ratio is, then, much 
better to use for a measure of correlation. The correlation coeffi¬ 
cient r, being smaller than the correlation ratio y, is more 
conservative than y. Sometimes it is too much so. 

The coefficient r, together with o x and o Y) defines the slopes of 
the regression lines and so furnishes a means of determining 
general tendencies when regression is sufficiently close to linearity. 
For instance, the general rate of increase of leaf breadth with 
increase of length, or the general rate of increase of leaf length 
with increase of breadth in the universe represented by the given 
sample, is indicated. 

28. Regression Equations Referred to Axes through (0, 0).— 
The regression equation of x on y , 

Ox 

X = r - y, 

oy 

gives the proportional change in the X variable to be expected 
with unit change in the Y variable. This equation is referred to 



Fig. 55. 


x- and raxes through the arithmetic means, a x and a r , of the two 
variables. It is a simple matter to transform the ec l uatl01 ! 

X and axes Y through the zeros of the two scales of ongma 
measurements of X and Y. In Fig. 55, the point P has coord - 
nates 5 and y equal to ME and EP, referred to *a»s and W 
and coordinates X and Y equal to OA and AP, referred t X 





CORRELA TION, REGRESSION 


241 


and 7-axis. Since 0 is at the zero of both scales and M at the 
arithmetic means, OC - a x and OD = a Y . It is seen, at once, 
that 

X = OC + M E = a x + X, 

and 

7 = OD + EP = a y + y. 

Whence, 

X = X — O-Xi 
and 

y = Y — a Y . 

To get the transformed equation, substitute these values of x 
and y in the original equation. 


becomes 

x = r- x y 

<Ty 


X — a x = r°— (Y 

ay 

or 

X = r ff *Y+(a x - 

ay \ 


<Ty / 

29. Example.—Applying this to the leaf lengths and breadths 
of Table LXX for which 

a x = 26.04, 
a Y = 36.25, 

<?x = 3.90, 

<r Y = 7.55, 
r = 0.63, 

the result is 


^ _ 0.63 X 3.90 
~T55 


Y + ( 26.04 - 


0.63 X 3.90 
7.55 


or 


X 36.25) 


X = 0.337 + 14.26. 

k equ ation from which to estimate the average leaf 

breadth X to be expected for any given length 7 on the assump¬ 
tion of linear regression. 

For example, if length 7 is 27, the expected average breadth X 
associated with it is 

X = 0.33 X 27 + 14.26 
- 23.17. 

By inspection of the table, the average breadth of leaves in 
the sample of 60 associated with length 27 is 23 . 5 . 




242 


FIRST COURSE IN STATISTICAL METHOD 


Similarly, the regression equation 

G y 

y = r — x 

G X 

is transformed into 

Y = r — X+ (a Y - r — a x \ 

a x \ <tx / 

For the table of leaf lengths and breadths, this becomes, 

Y = 1.22X + 4.49. 

For breadth A” = 22 the equation calls for an average length Y 

of _ 

Y = 1.22 X 22 + 4.49 

= 31.33. 

Inspection of the table shows an average length of 31.5 for 
breadth of 22. 







Fig 50.—Regression lines of Y on X and X on Y. 


30. Regression Lines Plotted.-These regression lines are 
plotted in Fig. 50. The line RR' shows the regress.on of y ■ 
It goes through the origin M for the equation 

y = 1.22x. 




CORRELA TION, REGRESSION 


243 


For the origin 0, the equation is 

Y = 1.22® + 4.49, 

with the F-intercept, 4.49 = OB. 

The line SS ' shows the regression of x on y. It goes through 
the origin M for the equation 

x = 0.33?/. 

For the origin 0 the equation is 

X = 0.33 Y + 14.26, 

with the X-intercept, 

14.26 = OA . 

OC = a x and OD = a Y . 

31. Points Plotted from the Table.—The points marked with 
small crosses are determined from columns y' and x' ax in Table 
LXX. The points marked with small circles are determined 
from rows x' and y' av . In both cases, the corrections for x' and 
y' have been applied, and the results multiplied by the proper 
class interval. As an example for determining the points, take P, 
the uppermost circle. It is for x' = 3 and y' av = 2.50 of the 
table. 

x = x' — x' = 3.00 — 0.35 = 2.65 class intervals 
x = 3 X 2.65 = 7.95 mm. 

V = y'a V — y' = 2.50 — (—0.15) = 2.65 class intervals. 
y = 5 X 2.65 = 13.25 mm. 

The point P s (7.95, 13.25) is then plotted on the x-axis and 
y-axis. The other points are determined in the same way. 

If either set of points be plotted, one may frequently be able to 
tell by experience whether the points group themselves suffi¬ 
ciently. close to a straight line to warrant assuming linear 
regression without further computation. 

Besides Pearson’s coefficient r, and the correlation ratios 
Vxy and 1)YX, other measures of correlation have been devised. 
One important coefficient of correlation is that from ranks. 

32. Rank Correlation.—Bank is the position of an item in an 
array such that the item of rank 1 has no item preceding it. The 
item of rank 2 has one and only one preceding it. The item of 
rank 3 has two and only two items preceding it, and so on. 

I he formula for rank correlation is 

62rf2 



244 


A FIRST COURSE IN STATISTICAL METHOD 


in which d stands for the difference in rank of the same item with 
respect to two different characteristics. N is the number of 
items or ranks. 

If ranks with respect to two characteristics are regarded as the 
values, X and 7, of two associated variables, and if there are no 
duplications of rank, and no ranks lacking, this formula for p 
(rho) is the same as for Pearson’s r. It thus assumes linear 
regression. It is frequently applied to other distributions with 
the risk of leading to serious inaccuracies. It should always be 
used with great caution. 

33. Illustration.—As an illustrative example, take the following 
table of ranks of five students in English and in Mathematics, 
Table LXXI. 


Table LXXI. —Ranks of Five Students in English and Mathematics 


Student 

Rank 

d 

d 2 

English 

Mathematics 

A 

1 

2 

1 

1 

B 

2 

4 

2 

4 

C 

3 

1 

2 

4 

D 

4 

5 

1 

1 

E 

5 

3 

2 

4 





II 

N 


Mr. A ranked first in English and second in Mathematics; Mr. 
B ranked second in English and fourth in Mathematics; and so on. 


6 X 14 
5(25 - 1) 


0.30. 


This indicates that there was not very high correlation between 
English grades and Mathematics grades for these five students. 
This means that, with those students for which these five may be 
regarded as a representative sample, one cannot expect those 
getting high English grades necessarily to get high Mathematics 

grades, and vice versa, , 

34. Derivation of Formula.—Let X and 7 be the ranks o 

same item with respect to two characteristics; a x and a T the corre¬ 
sponding respective arithmetic means; and <r x and <r r their 
standard deviations. The X ranks, under the assumption of no 




CORRELA TION , REGRESSION 


245 


duplications, run through the series of natural numbers, 1, 2, 3, 
. . . N. 

It is readily proved in algebra that the sum of the series 

N(N + 1) 

2 
and 


1 +2 +3 + . . . + N = 


P + 2 2 + 3 2 + . . . + X 2 » N(N + 1) (2N + 1). 
Then, 

2X_l+2+3+...+X N + l 


ax ~ TT ~ 


N 


Since Y has the same set of values, 

N + 1 


a Y — dx — 


The two arithmetic means are equal. 

Na* x = 2(X - a x y = 2X 2 - 2a A -2X + Na* x . 

Substituting the values of 2X 2 and 2X from the above series, 
and the value of a x 

i N(N + 1)(2JV + 1) - 2 ■ —tj • + X) 


+ 


N(N + l) 2 


N z — N 


12 


Therefore, 


2 - 1 
" - 12 " ^ 


Let X a x — x, deviation of X from arithmetic mean, 
and 

Y ~ a Y = y, deviation of Y from arithmetic mean. 
Subtracting, 

X ~ Y = x - y, 


Since 


Now 


Qx — a 




<r\ 


2x 2 

N* 

N' 


■*,„ = _S(X - Y Y _ s( x - y y 

N V 


Let 



246 


A FIRST COURSE IN STATISTICAL METHOD 


Now 

So 


2(z - y) 2 = 2z 2 + 2?/ 2 - 22 xy. 


Na 2 x-y = Ni t 2 x Na 2 y — 22 xy. 

Pearson’s coefficient is 



r = 


2 xy 
Ndxffy 


2 xy = Nr<r x a Y . 



Substituting in Eq. (1) 

Na 2 X -y = A T <x 2 x + N<r 2 y — 2Nra x a Y . 


Solving for r, 


r = 


<r\v + o’ 2 k — a'x- 


2(7 y 


Since<7* = a Y , substituting^ for a Y y giving 

2(7 2 .y — (7 2 a :- Y 
r ~ 2a 2 x 


= 1 
= 1 

= 1 
= 1 


o x-y 


2(7 2 A" 

2(X - Y ) 2 


2iV 


A -2 - 1 


12 

62(X - Y) 2 
N{N 2 - 1) 
62 d 2 

N(N 2 - 1)’ 


since X - Y is rf, the difference in rank. This expression for r 

is called p to distinguish it from r itself. 

36. Duplicated Ranks—If there are duplications of rank, what 
is known as the mid-rank method may be used. It is bes 

shown by an illustration, using Table LXXII. 

If the grades of C and D had not been alike but between 

and 88, their ranks would have been 3 and 4, respectively. ® 
mid-rank method assigns them each a rank halfway between 
and 4, namely 3The grades of F, G, and H, being all 85, 
are treated the same way in ranking. If they had been di eren , 
the ranks would have been 6, 7, and 8. The mid-ran ' is gi 
to each. Then the formula for p is applied as before. 



CORRELATION , REGRESSION 


247 


Table LXXII— Ranks of 10 Students in English as Shown by Their 

Grades 


Student 

Grade 

Rank 

A 

98 

1 

B 

9G 

2 

C 

95 


D 

95 

3 M 

E 

88 

5 

F 

85 

7 

G 

85 

7 

H 

85 

7 

I 

75 

9 

J 

60 

10 


36. Error Due to Duplication.—In this way the arithmetic 
mean of ranks is preserved, but an error is introduced into the 
standard deviation. If there are few cases of duplication and 
only two duplicates in each case, the error is small. If there are 
many cases of duplication of rank or several individuals hav¬ 
ing the same rank, the error may lead to inaccurate conclusions. 

See, for example, what error is introduced in standard devia¬ 
tion in the above example in Table LXXII. Standard deviation, 
computed from Table LXXIII is V&00 = 2.828. 


Table LXXIII. —Computation of Standard Deviation of Ranks of 

Table LXXII 


Student 


Total 


A 

B 

C 

D 

E 

F 

G 

H 

I 

J 


Rank 


d\ 


1 

-4.5 

20.25 

2 

-3.5 

12.25 

3.5 

-2.0 

4.00 

3.5 

-2.0 

4.00 

5 

-0.5 

0.25 

7 

1.5 

2.25 

7 

1.5 

2.25 

7 

1.5 

2.25 

9 

3.5 

12.25 

10 

4.5 

20.25 

55 


80.00 


a = = 5.5 <r 2 = 8 % 0 = 8.00 







248 A FIRST COURSE IN STATISTICAL METHOD 

Standard deviation by formula, when the ranks are all differ¬ 
ent, is 

' = ^ = 2 - 872 - 

/ 

The error is about 2 per cent. 

If only C and D had been of equal rank, a 2 would have been 
8.2, differing but little from a 2 = 8.25, derived from the formula. 

Of course, in case of decided error in a, it is not very laborious 
to compute the correct value and it should be done. 

In the formula for p, each value of the variable has been assigned 
to its proper rank. This may be expected to give a different 
value from the product-moment method where the actual grades 
would be used. For example, if two sets of values were as follows, 

X 100 99 98 90 40 30 10_5 

Y 99 98 97 96 95 94 93 90* 

differences of rank are all zero, and hence p = 1. 

It is at once seen that the standard deviation of X will be very 
much larger than that given by ranks. That for Y will be more 
nearly the same as that by ranks. The coefficient p will differ 
materially from r. 

Comparison of p and r is hardly justifiable in many cases. 
One set of values for two associated variables may have the same 
differences in ranks as has another set of values, giving the same p 
in both cases, while the two values of r differ quite materially. 
In the above example, p = 1. Computing r, it is found to be 
0.925. If the same values of X are used and the values of Y 
are each twice the corresponding values of X, then p = 1 and 

T = 1 , 

If it is not possible to put quantitative values on the variables, 
but they can be ranked, then resort to the use of p. Karl Pearson 
has derived the relationship 



when there is normal probability distribution in the actual scores. 
He comments that the method of obtaining r by means o! P 
is less exact than the method developed in this chapter. 

37 Graphical Determination of r.—Graphically, a g 
value of r may be obtained by plotting the points for the regres- 



CORRELA TION, REGRESSION 


249 


sion lines, drawing the regression lines by the eye, and measuring 
their slopes hi and b 2 . The product of these slopes gives r 2 . 

r — y/bM. 

No further methods of measuring correlation will be taken up 
in this book. 

38. Warning.—Too much stress cannot be put on using the 
coefficient of correlation, as well as other statistical constants, 
with care. Do not draw conclusions that are unwarranted. 
Remember that r might be small, though there is perfect correla¬ 
tion when regression is non-linear. On the other hand, r may be 
large and the variables have no actual connection. Table 
LXXIV gives an illustration. 


Table LXXIV. —Bank Clearings in Billions of Dollars and Com¬ 
pound Amount of SI at 5 Per Cent 


Date 

Compound 
amount of 
$1 at 5 per 
cent, X 

Bank 

clearings in 
billions, Y 

X 

V 

** 

l/* 

ij, 

1SS3. 

1.00 

14 

-1.40 

-24.75 

1.96 

m 

34.65 

1893. 

1.63 

23 

-0.77 

-15.75 

0.59 


12.13 

1903. 

2.65 

43 

+0.25 

+ 4.25 

0.06 

18.1 

1.06 

1913. 

4.32 

75 

+ 1.92 

+ 36.25 

3.09 

1,314.1 

69.60 

Totals... 

9.60 

loo 



6.30 

2,192.9 

117.44 


q.y — 


9.60 
4 

r = 


= 2.40. 


117.44 


V6.30 X 2,192.9 


Ur = - 7 - = 38.75. 
4 

= 0.999. 


This shows practically perfect correlation. Yet no one thinks 
that a dollar increases at compound interest because bank clear¬ 
ings increase. Nor can it be thought that bank clearings increase 
because of compound amount. In fact, a different set of dates 
for bank clearings would have given a very different result. 

Correlation, as dealt with in this chapter, has picked out two 
variables and attempted to discover a probable relationship. 
As a matter of fact, the two variables are usually affected by a 
multitude of causes that have not been taken into account. 

A high coefficient of correlation does not prove anything. It 
suggests the probability of a cause-and-effect relationship between 
the two variables. The statistician has no further concern. The 
















250 


A FIRST COURSE IN STATISTICAL METHOD 


investigator, dealing with these variables, takes the suggestion 
and searches for a reason. Frequently, the statistician and the 
investigator are the same individual. He must be careful not to 
let his statistics run away with his reason. 

Exercises 

1. Compute Pearson’s coefficient of correlation for each of Tables LXII 
to LXIX, inclusive. What is the reason for the increasing values of r? 

2. Compute Pearson’s coefficient of correlation for each of the Tables 
LXII to LXIX, inclusive, using the short-cut method as displayed in Table 
LXX. 

3. Find vxy and vyx for each of the Tables LXII to LXIX inclusive. 

4. Apply the test of Sec. 26 and determine whether there is sufficient 
departure from linear regression to be significant for any of the Tables LXII 
to LXIX. 

6. Plot the regression lines for each of Tables LXVI to LXIX inclusive. 

6. Using the frequency table of mean monthly temperatures at Seattle 
as found in Ex. 2, Chap. II, and the frequency table of monthly precipitation 
as formed in Ex. 3, Chap. II, construct a correlation table similar to Table 
LXX. Determine the arithmetic mean and standard deviation for mean 
monthly temperature and for monthly precipitation. Compute Pearson’s 
coefficient of correlation and its probable error. What is the meaning of 
the negative sign of r? 

7. Determine the regression coefficients of mean monthly temperature 
on monthly precipitation and of monthly precipitation on mean monthly 

temperature, using the data of Ex. 6. , 

8. Given the following associated values of A' and Y, compute Pearson s 

coefficient of correlation between A and Y and its probable error. 

X 3 5 8 12 17 23 30 
Y 1 2 6 24 40 50 60 

9. From the following table compute the coefficient of correlation 
between savings-bank deposits and strikes and lockouts in the United 
States over the period 1916 to 1922 inclusive. 

Saving s-banks Deposits and Strikes and Lockouts in United States 

Savings-banks deposits, Strikes and lockouts, m 
Date in billions of dollars thousands 


1916 

1917 

1918 

1919 

1920 

1921 

1922 




CORRELA TION, REGRESSION 25 1 

10. Compute the probable error of r obtained in Ex. 9. What is the 
significance of the result? 

11. Find Pearson’s coefficient of correlation between leaf length and 
leaf breadth as shown in Table XXII. Use the classes of Tables XXIII 
and XXIV. Use the short-cut method. Compute the probable error of the 
coefficient. 

12. Find Pearson’s coefficient of correlation between leaf length and leaf 
breadth from the data of Ex. 18, Chap. II. Use the classes of the tables in 
Exs. 19 and 20, Chap. II. Compute the probable error of the coefficient. 

13. Compute Pearon’s coefficient of correlation between leaf length and 
leaf breadth from the data of Ex. 18, Chap. II. Use the same class intervals 
as were used in Exs. 19 and 20, Chap. II, but change the class limits. Com¬ 
pute the probable error. How do the results compare with those of Ex. 12? 

14. Plot the regression lines for the data used in Ex. 12. 

15. Plot the regression lines for the data used in Ex. 13. 

16. Letting leaf breadth be the X variable, and leaf length be the Y 
variable, find yjxv and >jyx for the data of Ex. 12. 

17. Similarly, find rjxy and rjyx for the data of Ex. 13. 

18. Apply the test of Sec. 26 and determine whether there is sufficient 
departure from linear regression to be significant in the data used in Ex. 12 

19. Similarly, apply this test to the data of Ex. 13. 



CHAPTER X 


LOGARITHMIC GRAPHICAL REPRESENTATION 

Non-uniform scales were briefly treated in Chap. III. It was 
seen that if two variables, x and y, were related by the equation 


y = fca*, 

by plotting pairs of values of x and log y the resulting graph 
would be a straight line. Look up and tabulate the logarithms of 
the values of y and plot these with the corresponding values of x. 
This is equivalent to making a uniform scale of logarithms for the 
y-scale and plotting the values of log y on this scale.. The best 
way, however, is to have a non-uniform, logarithmic scale on 
which to plot the values of y directly without having to look up 
logarithms in a table. 

Graph paper ruled to such a scale is now readily obtained on the 
market. 1 Paper ruled to uniform scale one way and logarithmic 
scale the other way is called semi-logarithmic. It will be 

assumed that such ruled paper is at hand. 

1. Illustration.— With the following set of pairs of values of x 
and y, the two methods may be illustrated: 


x 


10 


20 


30 


40 


50 


y 

log y 


1.791 3.207 5.743 10 .286 18.420 
07253 67506 07759 ToT2 1.265 


Figure 57 shows a uniform z-scale and a uniform log y- scale 
Plotting the above pairs of values of * and log y, it may be seen 
that the points fall in a straight line. Now plottmg the yaiues o 
x and y directly on semi-logarithmic paper, line AB (rig. 5 1 
obtained. This also proves to be a straight line. In e tne 
Fie 57 or 58, the fact that the line is straight shows uniform 
change in log y. That is, for equal changes in *, the changes 

. Codex Book Company, New York, and the Standard Graphic Chart 
Company, New York. 


252 



LOGARITHMIC GRAPHICAL REPRESENTATION 


253 












254 


A FIRST COURSE IN STATISTICAL METHOD 


log y are also equal. Or, the changes in log y are proportional to 
changes in x. Now 


log t /2 - log 2/1 = log V -> 

y i 

log ?/3 - log 2/2 = log — > 

2/2 

and so on. 

If these successive differences of logarithms are equal, the ratios 
of the corresponding values of y are equal. If 

log ?/ 2 - log 2/1 = log 2/3 - log 2/2. 

then 


2/2 2/3 

- __ • 

2/i 2/2 

Since these figures give straight lines, it is seen at once that 
with equal changes of x (say 10 at a time) the changes in the 
corresponding ordinates to the line are equal. This means that 
the ratio of the corresponding values of y are equal. That is. the 
rate of change in y is constant. These facts might have been seen 
by inspection of the table of values. The variable x changes 10 
at a time. The corresponding values of log y change 0.253 at a 
time. The ratio of each value of y to the preceding value is 1.791. 
Fora; = 10, y = 1.791. Fora; = 20, y is 1.791 X 1.791 = 3.207. 
For x = 30, y is 1.791 X 3.207 = 5.743, and so on. The values 
of x form an arithmetic progression, while the corresponding 
values of y form a geometric progression. For this reason, the 
graph of Fig. 58 is frequently called a geometric chart, those to 
uniform scale being called arithmetic charts. 

2. Reading of Ratios.—Since log y 2 /yi = log 2/2 — l°g 2/i> the 
ratio of 2/2 and 7/1 may be determined, at once, graphically from 
the chart. With a divider, take off the difference of the ordi¬ 
nates at x 2 and xi. Set this on the logarithmic scale with one leg 
of the divider at 1, and read 1.79 at the point on the scale where 
the other leg of the divider falls. The reading on the logarithmic 
scale gives the value directly, since the points on the scale are 
marked with the numbers of which the distances are logarithms. 
So y is multiplied by 1.79 for every increase of 10 in x. By taking 
the difference of ordinates for a difference of 1 in x, the rate of 
change in y per unit change in x will be determined. In t is 
figure, the scale is too small to do this very accurately, but if care 
is used in using the divider to transfer the difference in ordinates 



LOGARITHMIC GRAPHICAL REPRESENTATION 


255 


to the y- scale, it will be found that the result is about 1.06. This 
means that for an increase of 1 in x, y is multiplied by 1.0G. In 
other words, the graph shows the compound amount of SI at 6 
per cent per annum compounded annually. The straight line 
AB may be extended and the compound amount read off for any 
number of years. 

3. Scale Numbers by Decks.—Notice that the printed scale 
at the left-hand margin of the paper runs up from 1 to 9 and 
then repeats. The distance from 1 to 1 is the logarithm of 10 
and equals the unit of the scale to which logarithms were plotted. 
A complete ruling from 1 to 1 is called a deck. The publishers of 
such ruled paper provide sheets with one deck, two decks, or 
more. The series of printed figures on each deck is the same. 
The logarithm of 1 is 0. Any 1 of any deck may be used as the 0, 
or starting point. If the bottom 1 on the sheet is the starting 
point, or zero, the next one above it is read 10 , since it is at unit 
distance from the zero, and log 10 = 1. The next 1 above that 
is read 100 , since it is at two units distance from the zero, and log 
100 = 2. If there are three decks, the next 1 would be read 
1,000, since log 1,000 = 3, and so on. 

4. Reading of Scale Numbers.—The lowest 1 may be read 10, 

meaning that the zero or starting point is the width of one deck 

below this point, and off the paper. The next 1 above is read 100, 

the next 1,000, and so on. In fact, the lowest 1 may be read as 

any integral power of 10 , such as 1 , 000 , 100 , 10 , 1 , 0 . 1 , 0 . 01 , 

and so on indefinitely. Then the next 1 above is the next power 

of ten, and so on. If the lowest 1 is 100,000, the next above it is 

1,000,000, the next above that, 10,000,000, etc. Or if the top 1 

is 0 . 01 , the next below is 0 . 001 , the next below that is 0 . 0001 , 
and so on. 


6 . Logarithmic Graph Unlimited.—With these facts in mind, 
the statement that the straight line AB (Fig. 58) may be extended 
for any number of years is literally true. If, in this figure, the 
line A5 * extended to the top of the sheet, the compound amount 
ead where it reaches the top line, at C, is S 100 if the lowest 1 is 

Win u 18 A at aUtt ' e bey0nd X = 79 - From the P oi “t C. drop 

verticaUy down to the bottom line of the ruling at D. Call the 

had h h f ♦“?, •.°2- and 6Xtend th ® graph with the same sl °P e ^ 
before till it hits the right-hand margin at E, where i = 110 

P° m t horizontally to the left-hand margin, mark the 
z-scale 110 in place of 10 , and extend the graph with its proper 



256 A FIRST COURSE IN STATISTICAL METHOD 

slope till it reaches the top line. Proceed as before till a sufficient 
range is covered. Of course, if the graph is divided into too 
many parts, it becomes confusing. 

From this graph may be read not only the compound amount at 
any date, but the date at which the amount has reached any given 
value. For example, in how many years will SI, compounded 
annually at 6 per cent per annum, amount to $10? At the 
point on the graph where y is 10, x is about 39.5. In about 
39.5 years the compound amount of SI at 6 per cent is $10. 

6 . Further Illustration.—Now tabulate and plot the compound 
amounts of $10 at 6 per cent. 

x 10 20 30 40 50 

y 17.91 32.07 57.43 102.86 184.20 

Since the rate is the same, the ratio of amounts at any two 
given values of x will be the same for $10 as for $1. So the graph 
will be parallel to AB. It starts at a, proceeds to c, drops down 
to d, and then up to b, with the same slope as the first graph. 

Since the graphs are parallel, the vertical distance between them 
is everywhere the same. Taking off, with the dividers, the verti¬ 
cal distance between the two graphs and reading it inthey-scale, 
it is found to read 10. Therefore, the second amount, at any 

date, is 10 times the first amount. 

7 . Contrast with Uniform Scale.—In order to contrast the use 
of the logarithmic scale with that of a uniform scale, the two 
sets of data are plotted on uniform scales in Fig. 59. The line 
AB goes up rather slowly. The line ab goes up much more 
rapidly with increasing rapidity as x increases. Almost anyone 
looking at this chart would say at once that the variable repre¬ 
sented by ab is increasing more rapidly than that shown by AB. 
The logarithmic graph shows at a glance that they are increasing 

at the same rate. , . 

What Fig. 59 does show at a glance is that the absolut 

increases for each 10 -year period are increasing, and that for the 
upper curve, they are much greater than for the lower. The uni¬ 
form scale chart, showing absolute differences, is sometimes calle 
a difference chart in contrast to ratio chart for that with a logari - 
mic scale. If a comparison in rate of growth is desired, the difl - 
ence chart is almost useless. While differences are not at once 
seen on the ratio chart, they are easily obtained by foltowg 
over from points on the graph to the scale marks and reading 



LOGARITHMIC GRAPHICAL REPRESENTATION 257 

them. The business man, the biologist, the sociologist, and the 
economist are almost always concerned with comparisons in 
growth and rate of growth. Hence it is, that the ratio chart 
is being used more and more. When its simplicity and power are 
understood, it appeals very strongly to the investigator in almost 
any branch where graphical studies can be used. 



8 . Organic Growth.—The equation 

y = ka cx 

expresses what is called the law of organic growth. For equal 
differences in x, there are equal ratios in y. If x varies in arith¬ 
metic progression, y varies in geometric progression. This is 
shown by the simple case 

y = 2 * 

* 1 2 3 4 5 6 

y 2 4 8 16 32 64 



258 


A FIRST COURSE IN STATISTICAL METHOD 

The rr-series is an arithmetic progression with constant difference 
of 1. The y-series is a geometric progression with constant ratio 
of 2. Applied to a historical variable, the law states that at 
equal intervals of time the variable changes by a constant per 
cent. In the above example, if x is years, y doubles, increases 
100 per cent each year. Many of the forces of nature follow more 
or less closely some form of this law. It is the law of compound 
amount. Business affairs have a tendency to follow this law. 
If salesmanship and internal management remained in the same 
state of efficiency, and external influences were constant, business 
might be expected to follow the law of organic growth rather 
closely. 

9. Saturation Point.—As a matter of fact, it cannot be expected 
that any given business will follow this law indefinitely. There 
comes a slowing-up process, and what is called a saturation point 
is likely to be reached. The slo wing-up process and the approach 
to saturation point are readily shown in a ratio chart. Take as 
an illustrative example the data of Table LXXV. 


Table LXXV.— Sales in a Certain Business for 10 Years 


Year 

Sales 

Increase 

Per cent increase 

1 

S 100,000 



2 

200,000 

S100,000 

100 

3 

340,000 

140,000 

70 

4 

527,000 

187,000 

55 

5 

724,600 

197,600 

37.5 

6 

942,000 

217,400 

30 

7 

1,177,400 

235,400 

25 

8 

1,424,700 

247,300 

21 

9 

1,681,200 

256,500 

18 

10 

1,938,200 

257,000 

15 


10. Illustration.—Plotted to uniform scale, these data give 
Fig. 60. It is evident that sales are increasing throughout the 
period. Since the curve gets steeper and steeper, it shows that 
the annual increase is getting larger from year to year throughout 
the period. This is also seen by inspection of the third column 
of the table. From the graph and the first three columns of the 
table, the inference might easily be drawn that this business is 
having a remarkable growth. 





tio chart. 












260 


A FIRST COURSE IN STATISTICAL METHOD 


Figure 61 is a legarithmic graph of the same data. The curve 
goes up, showing an increase of sales every year. But the curve 
is concave downward; that is, it gets less and less steep with 
increase of time. This means that the per cent increase is getting 
less and less throughout the period. If that tendency continues, 
it may be expected that there will come a time when the curve 
will flatten out and cease to rise. When that time comes, the busi¬ 
ness will have reached a saturation point and will cease to grow 
unless efforts are made from within, or external conditions become 
more favorable. At any rate, it will be a critical situation, requir¬ 
ing careful treatment in order that the business may not decrease. 

11. Forecasting.—Internal and external conditions remaining 
the same, a forecast may be made as to when the saturation 
point will be reached by drawing a smooth curve through the 
points plotted, extending it to the point where a tangent will be 
horizontal, and reading the corresponding date. It must be 
remembered in making such a forecast that it merely says that 
past tendencies indicate so and so. Change in market conditions, 
domestic or foreign demand, or quality of salesmanship may 
change the result completely. Controllable elements entering 
into the situation may be manipulated to keep the curve on 
its upward way. Or slackness in methods may turn the curve 
downward at an earlier date. 


Table LXXVI.— Motor-car Registration in the United States' 


Year 

Number of registrations 

1912 

1,010,399 

1913 

1,248,056 

1914 

1,768,963 

1915 

2,445,664 

1916 

3,512,996 

1917 

4,983,340 

1918 

6,146,617 

1919 

7,565,446 

1920 

9,231,941 

1921 

10,465,995 

1922 

12,238,375 


I From -Statistical Abstract of the United States.” except 1912, 1913, and 1914, which are 
from newspuper reports. 



LOGARITHMIC GRAPHICAL REPRESENTATION 


261 


12. Example from Automobile Industry.—Figure 62 plotted 
from Table LXXVI, shows this tendency to approach saturation 
in the automobile industry. In 1913 and 1914 the curve grows 
steeper, showing there was a tendency to increase at an increasing 
rate. From 1914 to 1917 the curve is nearly straight, showing a 
tendency to maintain a constant rate of growth. From 1917 
to 1920 the curve is again nearly straight, but less steep, showing a 
tendency to a constant rate of growth through the period but at a 
lower rate than from 1914 to 1917. In 1921 the rate of growth 
diminished still more. In 1922, however the curve seems to be 
starting up a little more steeply. 

A glance at the curve as a whole shows an undoubted tendency 
to approach a high point, or point of saturation. The slight 



Fig. 62.—Motor-car registration in the United States. Ratio chart. 


increase in rate of growth for 1922 may quite likely be temporary. 
Increased efforts at expansion may keep up the growth tempo¬ 
rarily or even increase it. New elements of prosperity or increase 
of population may create sufficient growth in demand to keep up 
the rate of growth of business for a time. But it is impossible to 
get away from the general tendency of the 11-year period shown 
by the curve as a whole. This much is certain: The automobile 
industry is facing a probable saturation point in the not very 
distant future, and should conduct its business with that possi¬ 
bility in view. Later figures, only, will show the result. 

. ^ er Cent of Change.—Per cent decrease as well as per cent 

increase may be read directly from a ratio chart. Figure 63 is 










262 A FIRST COURSE IN STATISTICAL METHOD 

plotted from data of Table LXXVII, showing the total production 
of gasoline and the total export in millions of gallons. 

Table LXXVII.— Production of Gasoline and Exports of Gasoline 


in the United States 1 


Date 

Production, in 
millions of gallons 

Exports, in 
millions of gallons 

1921: 



First quarter. 

1,268 

155 

Second quarter. 


136 

Third quarter. 

1,268 

112 

Fourth quarter. 

1,312 

130 

1922: 



First quarter. 

1,315 

141 

Second quarter. 

1,513 

168 

Third quarter. 

1,656 

140 

Fourth quarter. 

1,718 

131 

1923: 



First quarter. 

1,823 

194 


i “Commerce Yearbook/’ 1922, U. S. Department of Agriculture. 


Suppose it is asked what per cent the exports for the fourth 
quarter of 1922 were of the exports for the second quarter of that 
year. This would be obtained from the table by dividing 131 
by 168. The ordinate at A is log 168, that at C is log 131; the 
log 13 /'i68 is log 131 - log 168 = BC. BC runs down and so is 
negative. Take off BC with dividers and transfer it to the 1 
at the middle of the left-hand margin as the zero point, 
setting it off from 1 downward, since BC is negative. The 
reading obtained is 0.78. Therefore, 

13 Kg8 = 0.78, 

and the exports the fourth quarter were 78 per cent of those the 
second quarter. 

To find what fraction (per cent) one quantity is of a larger one, 
take the difference of the ordinates of the two quantities, on a log¬ 
arithmic scale, and measure it down from the top of a deck. 

The resulting reading gives the answer. 

The exports for the first quarter of 1923 were how many tunes 

those of the fourth quarter of 1922? The table gives 

log 19 Xai = log 194 - log 131 = CD 















LOGARITHMIC GRAPHICAL REPRESENTATION 


263 


of the graph. CD runs up and is positive. With dividers, set 
it off upward from 1 of the scale, getting a reading of 1.48. So 
194 is 1.48 times 131, or is 148 per cent of 131. 

To find how many times greater one quantity is than a smaller 
one, take the difference of the ordinates of the two quantities, 
on a logarithmic scale, and measure it up from the bottom of a 
deck as zero point. The resulting reading gives the answer. 



The exports for the fourth quarter of 1922 are what per cent 
ot the total production for that quarter? FC is the difference of 
ordinates and is negative. Since FC is longer than the width of 
a deck, it must be laid off from the top 1 of the scale. The result 
reads 76. This is not 0.76, but 0.076, since the logarithm FC 

es between -1 and -2, making -2 the characteristic of the 
logarithm. So the answer is 7.6 per cent. 



264 A FIRST COURSE IN STATISTICAL METHOD 

The total production for the fourth quarter of 1922 was how 
many times the exports for that date? Now, not FC but CF is 
used. It is laid off upwards from 1. The reading is 131. The 
production was 13.1 times the exports. As CF is greater than 1 
and less than 2, the characteristic of the logarithm is 1. 

If one is unfamiliar with the meaning of characteristic of a log¬ 
arithm, the position of the decimal point may always be obtained 
by inspection of the numbers used. The problem of position of 
the decimal point is the same as it is on the slide rule. 

Comparison of rates of change in these two variables is made 
more easy by moving the exports curve up one whole deck. The 
dotted line shows the resulting position of the graph. From the 
first to the second quarter of 1922, it is seen at once that the rate 
of change was nearly the same for both variables, because the two 
graphs are nearly parallel at this period. Throughout the entire 
period of time, it is easy to compare rates of change and see at 
what dates one rate was increasing and the other decreasing, or 
both changing in the same direction. It can also be seen which 
is changing at the greater rate by noting which graph is the 
steeper. If actual values of the two variables are to be considered, 
it must be remembered that the exports graph has been shoved 
up one deck so that the one for its scale is now the 10 for the 
production scale. If values of exports are read on the dotted graph, 
using the original scale, these values must each be divided by 10; 
that is, the decimal point must be moved one place to the left. 

The correct readings are easily seen by proper labeling of the 
graphs. The total production graph is in hundreds of millions and 
the transposed exports graph is in tens of millions. The two 
graphs should be correspondingly labeled either along the grap 

or at one end. 

14. Trend by Least Squares Line—It may be desired to know 
the general tendency in rate of change of a variable. To fin 
this, a least squares line may be fitted to the plotted points of the 

logarithmic graph. 

Let 

z = log y. 

Then in the least squares adjustment will be used x,z, x\ 
and xz. Take the z-axis at the average x = a,. Then, referring 
to the formulas derived in Chap. VIII, page 194, the slope 

2 x'z 


LOGARITHMIC GRAPHICAL REPRESENTATION 


265 


and intercept 



The data for computation of such a least squares line for gaso¬ 
line exports are shown in Table LXXVIII. 

x = number of time periods from the beginning of the 
record. 

y = exports of gasoline in millions of gallons. 

2 = log y. 
a x = average of x. 

x' - x — a x deviation of x from average x. 


Table LXXVIII. —Logarithmic Least Squares Line for Gasoline 

Exports 


X 

y 

x' = X — Ox 

z = log y 


x'z 

0 

155 

-4 

2.1903 

16 

-8.7612 

l 

136 

-3 

2.1335 

9 

-6.4005 

2 

112 

-2 

2.0492 

4 

- 4.0984 

3 

130 

-1 

2.1139 

1 

-2.1139 

4 

141 

0 

2.1492 

0 

0.0000 

5 

168 

1 

2.2253 

1 

2.2253 

6 

140 

2 

2.1461 

4 

4.2922 

7 

131 

3 

2.1173 

9 

6.3519 

8 

194 

4 

2.2878 

16 

9.1512 

36 

• • • 

• • • 

19.4126 

60 

0.6466 


a x = = 4. 

Note the positive and the negative quantities in summing 
x'z. 


0.6466 

™ = —gQ- = 0.01078 = log 1.025. 

10m = 0.1078 = log 1.282. 

b = l g-4126 = 2 1569 = Jog 143 5 

The 2-intercept b is the arithmetic mean of the logarithms of 
the values of y . Its antilogarithm, 143.5, is then the geometric 
mean of the values of the exports y. For average of rates, geo¬ 
metric mean should be used. The average x, a x = 4, is the fourth 
date after the beginning of the period; that is, it is at first quarter 












266 


A FIRST COURSE IN STATISTICAL METHOD 


of 1922. It is simply the middate of the entire period. The value 
of b is laid off as an ordinate at this date. This is done by taking 
off with a divider the value of 143.5 on the logarithmic scale, 
setting it off as an ordinate at x = 4, first quarter of 1922, and 
locating the point K at the 2 -intercept. The actual distance is, 
of course, log 143.5, 2.1569. For greater accuracy in plotting, 
10m is used to plot the slope. For a horizontal distance of 10 
units of time, a rise is made of 10m = 1.28 measured on the 
logarithmic scale. 

OH = 10 units of time. 

HG = 1.28 on the logarithmic scale. 

From 0 to G gives the direction of the line. Draw a line 
through K, the 2 -intercept, parallel to a line through 0 and G. 
This line has the required slope and 2 -intercept. It is the least 
squares line sought. It is shown in the figure as the broken line 
MN. It may be called the “growth axis.” The slope of the 
straight line connecting any two points in a logarithmic graph 
determines average rate of change between the two points over the 
period determined by them. 

16. Meaning of Slope.—The slope m = 1.025 means that, on 
the average, exports of gasoline increased, or grew, 2.5 per cent 
per annum through the period tabulated. If the given data were 
graduated to this line, the exports for the first quarter of 1921 
would have been about 130,000,000 gal., as shown by the ordinate 
to the point M at the beginning of the line of closest fit. At 
the end of the period, first quarter of 1923, the exports would 
have been about 159,000,000 gal. This is read directly at the 
point N. It may be calculated by 

130(1.025)® = 158.5. 

16. Forecasting.—If the actual points of the graph seem to 
follow this least squares line, line of closest fit, line of trend, fairly 
well, fluctuating from one side to the other, it may be used to 
forecast the approximate number of gallons to be exported at 
later dates. What may be said is, that if the trend of exports 
for the period graphed continues, the exports for future dates 
will be shown, on the average, approximately by this line of 
trend. A safe prediction can be made only for so long a period 
as this trend continues. Judgment, experience, and knowledge 
of external and internal conditions must be used to determine 
for how long a time the forecast may be used. Since so many 


LOGARITHMIC GRAPHICAL REPRESENTATION 


207 


phenomena tend to follow the law of organic growth rather than 
a law of constant amount of increase, the logarithmic least squares 
adjustment is frequently much preferable to a least squares line 
on an arithmetic chart. 

If plotting of the original data to uniform scales reveals that 
the plotted points tend to group themselves about a straight line, 
fluctuating from one side to the other, then constant difference 
changes are indicated. The trend is then shown by a line of 
closest fit determined on the constant difference chart. 

If the points are plotted on semi-logarithmic paper and they 
seem to group themselves about a straight line, fluctuating from 
one side to the other, then the tendency to follow the law of organic 
growth is revealed. The trend line is plotted as a logarithmic 
least squares line. The values of ordinates to this line may be 
plotted on a uniform scale if a curve of closest fit is desired. This 
is the curve 

y — h ' 10 nr 

of page 199, Chap. VIII, in which 10 is the base of the logarithms 
used. 

Many economic data, when plotted on a logarithmic scale, tend 
to follow a curve which, for a time, may be nearly straight but 
gradually tends to flatten out, showing the data are approaching 
a saturation point. It has been found that such data may fre¬ 
quently be fitted by a curve whose equation is 

y = ak c *. 

This is known as the “Gomperz curve.” Taking logarithms 
of both sides gives 

log y = log a + c x log k, 
or 

log ^ = c* log k. 

CL 

Again taking logarithms of both sides gives 

log log V a = x log c + log log k. 

Put 2 = log log y/a, m = log c, and b = log log k and the result 

z = mx + b, 

is which is linear in z and x. 

If, then, for a set of observed values of z and y, the values of 
he logarithm of the logarithm of y/a are tabulated and these 
v ues are plotted with the observed values of z, and it is found 



268 


A FIRST COURSE IN STATISTICAL METHOD 


that the points tend to group themselves about a straight line, 
fluctuating from one side to the other, a Gomperz curve may be 
assumed to fit the original data. For the purpose of finding 
whether the data follow a Gomperz curve, a value for a may be 
assumed, since it is a constant multiplier. The fact that a Gom¬ 
perz curve fits the data reveals that there is a tendency toward 
a saturation point. If a log log scale be constructed, points may 
be plotted directly. 

It has been attempted to fit distribution of incomes to the 
equation, 1 


y = ax m . 

This equation was briefly discussed in Chap. VIII. 

Taking logarithms of both sides gives 

log y = m log x + log a. 

Put z = log y, v = log x, and b = log a , and the equation takes 
the form 

z = mv -f b, 

which is linear in z and v. To plot points directly for data sup¬ 
posed to fit this curve, logarithmic scales both ways are needed. 
The equation is known as “Pareto’s law.” 

Since as numbers grow larger, their logarithms differ less and 
less (see any table of logarithms), a frequency distribution skewed 
in the direction of large values may frequently be made symmetri¬ 
cal by plotting the size of item to a logarithmic scale. 

Recently paper has been printed having a uniform scale one 
way and a probability scale the other way. On this paper, a 
normal probability frequency distribution plots as a straight 
line. There is also printed paper with a logarithmic scale one 
way and a probability scale the other way. On this paper, a 
certain type of skewed frequency distribution plots as a straight 

line. 

There are many applications of logarithmic and other non- 
uniform scales. Users of graphical work should become familiar 
with the making and interpretation of graphs to non-uniform 
scales. There is considerable literature on the subject. 

i See “Income in the United States,” vol. II, National Bureau of Economic 
Research. 


LOGARITHMIC GRAPHICAL REPRESENTATION 


269 


Exercises 

1. Plot the following data on uniform, or arithmetic, scales. 

Year. 0123456789 10 

Sales, in 100 170 160 190 300 260 320 410 410 420 500 

thousands of 

dollars. 

(a) Compute and plot a least squares straight line of closest fit. 

(b) Plot the given data on semi-logarithmic paper with uniform scale for 
dates and logarithmic scale for sales. 

(c) Reading the values of ordinates to the least squares line in (a), plot 
them on the logarithmic scale in (5). What is the significance of the curva¬ 
ture of the line thus determined? 

2. Repeat the construction of 1(5). Compute and plot a logarithmic 
least squares straight line of closest fit. 

Which is better for showing the trend of rate of change of sales, the line 
in 1 (c) or the logarithmic least squares line just plotted ? Which is better for 
showing the trend in per cent of change? 

3. Plot the following data on uniform or arithmetic scales. 

Vear. 1915 1916 1917 1918 1919 1920 1921 1922 1923 1024 

Amount, in dollars 1 3 2 6 4 12 8 24 16 48 


Plot the same data on semi-logarithmic paper with uniform scale for the 
dates and logarithmic scale for the amounts. 

What is the significance of the regular rise and fall of the logarithmic 
graph? Why is the other graph not also regular? 


Number of Automobiles and Number of Deaths from Automobiles 

in the United States 


Year 


Number of automobiles in 
use, in millions 


Number of deaths from 
automobiles 


1915 

1916 

1917 

1918 

1919 

1920 

1921 

1922 

1923 


1.75 

2.42 

3.54 

4.94 

5.95 
7.90 
8.8S 

10.51 

10.96 


5,928 

7,397 

9,184 

9,672 

9,827 

8,878 

9,903 

11,066 


4. Plot on semi-logarithmic paper the number of 
number of deaths from automobiles. 


automobiles and the 




270 


A FIRST COURSE IN STATISTICAL METHOD 


From the graphs thus plotted, determine and plot the number of deaths 
per thousand automobiles for the period 1915 to 1922. 

On the assumption that the percentage decrease in number of deaths 
per thousand automobiles for 1923 would be the same as for 1922, continue 
the curve to 1923. From this, determine graphically the entry in the table 
for number of deaths in 1923. Continue the graph of number of deaths to 
1923 by a dotted line. 

5 . Determine graphically the per cent increase in number of automobiles 
for 1920 and the per cent decrease in number of deaths for the same year. 

Determine graphically the total per cent increase in number of automobiles 
for the period up to 1923. Determine the per cent increase of number of 
deaths for the same period. 

6. Find graphically the average per cent increase per year in number 
of automobiles and the average per cent increase per year in number of 
deaths. 

7. In January a certain business house introduced a new department. 
The monthly sales in this department for one year are shown in the following 
table. 


Sales in New Department 

Months 

January. 

February. 

March. 

April. 

May. 

June. 

July. 

August. 

September. 

October. 

November. 

December. 


Sales in 
Thousands 

. $120 

. 200 

. 190 

. 220 

. 300 

. 290 

. 320 

. 400 

. 390 

. 430 

. 500 

. 490 


Plot the data of the table on millimeter paper. Label this Fig. L 
Determine the best period for a moving average. Compute the moving 
average and record the results in an additional column in the table. Show 
the trend for the year by plotting the moving average in Fig. 1. 

Now plot the sales data and moving average on semi-logarithmic paper. 
Label this graph Fig. 2. Do not smooth the graphs of either figure. Draw 
the moving average graphs in dotted lines. 

Do the points on the moving average fall nearly on a straight line in Fig. 1 
What does this trend indicate in regard to the growth of the new department 
The line of trend in Fig. 2 is a curve becoming less and less steep toward the 
right. What does this indicate in regard to the growth of the new 

department? ., 

If the dotted curve of Fig. 2 continues to curve downward at about tne 

same general rate, what will happen to the rate of increase in sales in about 
one more year? 















LOGARITHMIC GRAPHICAL REPRESENTATION 


271 


8. Find the least squares straight line of closest fit to the logarithmic 
graph of Ex. 7, Fig. 2. Interpret the meaning of this line as compared with 
the trend line of Fig. 2. 


Operating Revenues and Taxes of Class 1 Railroads in United States 


Year 

Total operating 
revenues in billions 

Taxes in hundreds of 
millions 

Ending June 30: 



1913. 

S3.11 

SI.18 

1914. 

3.03 

1.35 

1915. 

2.87 

1.33 

1916. 

3.38 

1.46 

Ending December 31: 



1916. 

3.60 

1.57 

1917. 

4.01 

2.14 

1918. 

4.88 

2.23 

1919. 

5.14 

2.33 

1920. 

6.18 

2.72 

1921. 

5.52 

2.76 

1922. 

5.56 

3.01 


9. Plot the data of the above table of operating revenues and taxes on 
millimeter paper. Connect each point to the next by a straight line, not draw¬ 
ing a smooth curve. Select a scale that will include all the data. Call this 
Fig. 1. Label each graph showing what it represents. 

Now plot the data on semi-logarithmic paper. Plot the actual figures of 

the table. Label each graph showing what it represents and the scale used. 
Call this Fig. 2. 

10. By inspection of Fig. 1, Ex. 9, would you judge that the fluctuations 

in taxes and in total operating revenues are at all alike? Do they increase 
together? 

By inspection of Fig. 2, Ex. 9, would you judge that the fluctuations in 

taxes and in total operating revenues are at all alike? Is your answer the 

same as to the first question? If not, what is the reason for a difference in 
judgment? 

11. Determine from Fig. 2, Ex. 9, the per cent taxes are of total operating 
revenues for the year 1917. What is the easiest way of doing this? 

For the year 1920 total operating revenues were how many times the taxes? 
How is this most readily determined from the graph? 

12. As shown by the graph of Ex. 9, how long after June 15, 1915, did it 
take total operating revenues to double? How is the result obtained? 
. ld tax ® s double in this same period? If not, what per cent did they 













272 


A FIRST COURSE IN STATISTICAL METHOD 


13. At how long a time before Dec. 31, 1920, were taxes 50 percent of 
what they were at that date? How is this most easily obtained from the 
graph of Ex. 9? 

14. Does the graph of Ex. 9 show any time when taxes and total operation 
revenues were changing at the same rate? What shows this? Does this 
show in Fig. 1? 

16. As shown by the graph, the total operating revenues for 1920 were 
what per cent of those for 1919? Also what per cent of those for 1921? 
How are these results obtained? 



CHAPTER XI 


INDEX NUMBERS 

Index numbers, in their simplest form, are simply the ratios, 
expressed as per cents, of each of a set of values of a variable to 
the value of some one number used as a base. The base usually 
is one of the set of numbers used, or the average of all, or the 
average of a group selected from the entire set. 

1. Example.—The use of such index numbers as above defined 
is to facilitate comparisons in the relative values of two or more 



Date 

Fra. 64.—Prices of cows and milk. 


sets of variables or of a single variable at different times. Table 
LXXIX gives the prices of cows and of milk over a series of years 
from 1913 to 1922. These prices are plotted in Fig. 64. Curve 
A shows the price of cows and B shows the price of milk, both in 
the same unit of measurement, dollars. 

273 









274 


A FIRST COURSE IN STATISTICAL METHOD 
Table LXXIX. —Prices of Cows and Milk 1 


Date 

Cows per 
head on 
Dec. 15 

Milk, in 

gallons, 

average 


Index 
price 
of milk 

150 times 
price of 
milk 

1913 

S57.19 

EH 

i 

85 

74 

53.4 

1914 

58.23 

■iifl 

87 

74 

53.4 

1915 

56.79 

0.352 

85 

73 

52.8 

1916 

63.18 

0.364 

94 

75 

54.6 

1917 

76.16 

0.442 

; 114 

92 

66.3 

1918 

85.78 

0.556 

128 

115 

83.4 

1919 

95.54 

0.620 

143 

128 

93.0 

1920 

70.42 

0.668 

105 

139 

100.2 

1921 

53.30 

0.584 

80 

121 

87.6 

1922 

53.21 

0.524 

80 

109 

78.6 

Average. 

66.98 

0.482 

100 

100 



1 "Statistical Abstract of United States” for 1922. 



Dale 

Fio. 65.—Indices of prices of cows and milk. 












INDEX NUMBERS 


275 


While curve A displays the changes in price of cows very well, 
curve B is so flat as to indicate almost no change at all in price 
of milk. If B had been plotted to a scale to show its 
variations, A would go off the paper. So, for the purposes of 
comparison of the variations in the two sets of prices, each set is 
reduced to percentage of the average price. These percentages 
are shown in the fourth and fifth columns of the table, and are 
called indices of the prices. These numbers are purely relative 
and completely independent of the units of measurement used in 
giving either set of prices. These indices arc plotted in Fig. G5, 
curve A for index of price of cows and curve B for index of price of 
milk. It can now be seen at a glance how one set of relative prices 
changed as compared with the other. 

The two sets of prices seem to go up and down together, except 
that the price of milk continued to rise in 1920 after the price of 
cows began to drop. From 1917 on, the price of milk seems to 
lag behind the price of cows. If that part of curve B were moved 
to the left one year, the two curves would nearly coincide for 
the remainder of the period. From 1920 on, the price of milk 
would seem to be unduly high as compared with the price of cows. 
In 1922 the price of cows did not drop. Perhaps the mainte¬ 
nance of relatively high price for milk stopped the decline in the 
price of cows. Of course, these data should be considered in 
connection with other data not recorded here. 

2. Other Devices.—Sometimes such sets of numbers are com¬ 
pared by plotting one set to one scale and the other to another. 
This is a rather awkward way of accomplishing the result. 
The different.scales have to be borne in mind. 

Another method is to multiply one set by a number that will 
bring the two sets of values to comparable results. Thus the 
last column of Table LXXIX shows 150 times the price of milk. 
The number 150 was chosen because the average price of cows 
is about 150 times the average price of milk. This column of 
figures may be regarded as the price of 150 gallons of milk. 
The dotted curve C (Fig. 64) shows these new values. Curves 
A and C are easily compared. 

3. Comparison of Average Values.—The purpose of index 
numbers, in general, is the same as that of the special form just 
discussed, that is, to reduce series of data, more or less complex, 
to numbers purely relative, which enables direct comparisons to 
be made. The general index number deals with averages and is a 



276 A FIRST COURSE IN STATISTICAL METHOD 

ratio comparing the average value of a set of numbers at one time 
or place with the average value at some other time or place. 
This, on the face of it, seems like a simple matter. But when the 
various kinds of numbers to be compared, the different sorts of 
averages that may be used, the different methods of weighting 
possible, and the tests that index numbers should satisfy are con¬ 
sidered, it is found that the problem of making a scientifically 
satisfactory index number is a rather complex one. It is a prob¬ 
lem upon which the experts are not wholly agreed. Only the 
simpler aspects of the problem may be taken up in this work. 
To any one desiring a very complete survey and analysis of the 
subject, Prof. Irving Fisher’s “The Making of Index Numbers” 
is recommended. 

4. Definition.—Suppose an index of the prices of those commod¬ 
ities that enter cost of living is desired, showing the relative 
change in cost of living from one time to another. If all prices 
went up or down at the same rate and the quantities of commodi¬ 
ties used did the same, the problem would be a simple one. But 
some prices change rapidly, some slowly. Some prices increase 
while others decrease. The quantities used vary in more or less 
irregular manner also. This complicates the problem. 

The example just mentioned is an index of changes in costs from 
one time to another. Instead, the changes from one place to 
another might be shown. Instead of costs, there might be con¬ 
sidered changes in values of goods produced, quantities of manu¬ 
factured articles produced, prices of bonds, rates of exchange, 
wages, wholesale prices, retail prices, and many other things in 
regard to changes from one time to another or from one place to 
another. In fact, the index number serves to make comparison 
of the sizes of quantities in one set of things under one situation with 
sizes of the quantities in the same set of things under another situa¬ 
tion. For the purposes of this chapter, the sizes of quantities 
compared will be prices of commodities in a set, and the differen 
situations under which they are compared will be different times. 
The time unit taken will be the year. Prices of a set of commo 1 - 
ties one year will be compared with prices of the same set another 
year. The year with which the comparison is made is called the 

base year. 

6. Notation.— The following notation will be used. 

p = price. 

q = quantity. 



INDEX NUMBERS 


277 


p 0 = price for base year. 
p k = price for the fcth year. 
q 0 = quantity for base year. 
q k = quantity for the Arth year. 
p' = price of the first commodity in the set. 
p" = price of the second commodity in the set. 
p ,n = price of the third commodity in the set. 
pin = price of the rth commodity in the set. 
q' = quantity of the first commodity produced. 
q" = quantity of the second commodity produced. 
q’" = quantity of the third commodity produced. 
qW — quantity of the rth commodity produced. 

Thus, p" is the price of the second commodity in the set, for the 
third year, and q A ' is the quantity of the first commodity in the set 
produced in the fourth year; = sum of the prices of the com¬ 
modities for the fcth year, and ZpkQo = sum of the products of 
price of each commodity for the A*th year, multiplied by its quantity 
produced in the base year. That is, 

2p k q 0 = p k 'q a ' + Vk’qo" + P*"V" + • • • + P* (n V n) > 

if there are n commodities in the set. 

Sometimes, instead of a base year, the average over the entire 
period, or the average for a decade or other period of time, is used 
as a base. In the case of place-to-place comparison, p 0 and q a 
would be price and quantity produced at the place taken as base 
with which comparisons are made. Or p 0 and q 0 may be 
taken as the average price and quantity for all the places 
considered. 

The index number is nearly always expressed as a per cent. 
Sometimes it is expressed as a per thousand, as is done by G. H. 
Knibbs in Australia. The London Economist uses a per 2,200, 
there being 22 commodities in the set used. 1 This saves dividing 
summations by 22. 

6. Average Price Relative.—The ratio of a price of a commodity 
at one time to its price at another time is called a price relative. 
Thus, p k '/p 0 ' is the price relative of the first commodity for the 
kth. year as compared with the base year. The average price 

1 Prop. Irving Fisher, “Making of Index Numbers,” p. 371. 


I 

\ 


278 


A FIRST COURSE IN ST A TI ST I CAL METHOD 


relative of n commodities is called an index number of prices. 
It equals 



7. Example.—The following two tables furnish data for illus¬ 
trative problems. 


Table LXXX.— Production of Grains in the United States 


In millions of bushels 1 


Date 

Corn 

Wheat 

Oats 

Rye 

Barley 

Buck¬ 

wheat 

Rice 

(rough) 

1913 

2,447 

763 

1,122 

41 

178 

14 

26 

1915 

2,995 

1,026 

1,549 

54 

229 

15 

29 

1917 

3,065 

637 

1,593 

63 

212 

16 

35 

1919 

2,811 

968 

1,184 

75 

148 

14 

42 

1921 

3,069 

815 

1,078 

• . • «* •. i 

62 

n. i e 

155 

1 AAA »*_ 1 

14 

a A 1 A 7 

38 


> From "Statistical Abstract of the United States for 1922,” pp. 166, 167. 


Table LXXXI.— 


Date 


1913 

1915 

1917 

1919 

1921 


Com 


69.1 

57.5 

127.9 

134.5 

42.3 


Price of Grains in the United States, in Cents per 

Bushel 


Farm values as of Dec. I 1 


Wheat 


79.9 

91.9 
200.8 

214.9 

92.G 


Oats 


39.2 

36.1 
66.6 
70.4 

30.2 


Rye 


63.4 

83.4 
166.0 
133.2 

69.7 


Barley 


53.7 

51.6 

113.7 

120.6 

41.9 


Buck¬ 

wheat 


75.5 

78.7 

160.0 

146.1 

81.2 


I i_ » _ ■ — - 

i From “ Statistical abstract of the United States for 1922,” pp. 166, 167 


Rice 

(rough) 


85.8 

90.6 

189.6 

266.6 
95.2 


Using 1913 as base year, the price relatives of the grains tabu¬ 
lated in Table LXXXI are as follows. Each ratio is mult - 

plied by 100. 


INDEX NUMBERS 


279 


Table LXXXII— Price Relatives of Grains in the United States* 


Grain 

1913 

1915 

1917 

1919 

1921 

Corn. 

100 

83 

185 

195 

61 

Wheat. 

100 

115 

252 

269 

116 

Oats. 

100 

92 

170 

180 

77 

Rye. 

100 

132 

262 

210 

110 

Barley. 

100 

96 

212 

224 

78 

Buckwheat. 

100 

104 

212 

193 

107 

Rice. 

100 

105 

221 

311 

111 

Total. 

700 

727 

1,513 

1,582 

660 


i Divisions by slide rule. 


The price relative of corn for 1915 is 57.5/69.1 X 100 = 83. 
For rye for 1917, 166.0/63.4 X 100 = 262. The index numbers 
of prices of grains are: 


1913. 

700 

. 7 

= 100 

1915. 

727 

. 7 

= 104 

1917. 

1.513 

. 7 

= 216 

1919. 

1,582 

. 7 

= 226 

1921. 

660 

= 94 


. 7 


8. Interpretation.—The interpretation is that for these grains, 
according to these index numbers, in 1915 the general average 
price was 4 per cent higher than in 1913; for 1917, 116 per cent 
higher than in 1913; for 1919, 126 per cent higher; and for 1921, 
6 per cent lower than in 1913. 

9. Weighted Averages.—This index number of prices is only 
one of many possible index numbers, some good, some bad. No 
account is here taken of the quantities produced. Each year 
there was much more corn produced than there was rye. A rise 
in the price of corn is more important than a rise in the price of 
rye. So, instead of a simple arithmetic mean, a weighted arith¬ 
metic mean suggests itself, weighting each price by the quantity 

produced. For the base year, there would be a weighted aver¬ 
age of 















280 


A FIRST COURSE IN STATISTICAL METHOD 


For year k, 



Zpkqk = 
2qk 


w k . 


The index for year h would be 

Wk _ 'Z'Pkqk ^ 2p 0 g 0 

w 0 ’ 2g 0 

2p*g* . 2g* 

■ - ■ • 

2po?o 2g 0 


This is Fisher’s formula 52. 

Another form that would suggest itself is obtained by weighting 
each price relative by the value of the corresponding commodity 
produced in the base year, that is, by po<?o- 
This formula would be 



2p 0 ?o 


This is Fisher’s formula 3. 

It reduces to 

5M°, 

which is Fisher’s formula 53. In this form, it appears that the 
quantities produced in the base year are used as weights for both 
years. This formula is used by the United States Bureau ot 

Labor. 1 

10. Other Averages Used.—The fact that there are other aver- 
ages than the arithmetic mean suggests their use in making in ex 
numbers. Professor Fisher has discussed 134 different index 
number formulas divided into six classes according to the type 
of average used. The only averages he uses are arithmetic. 

harmonic, geometric, median, mode and aggregative. 

The formula for the simple arithmetic index number 

already been given as 

_Po. 

n 

i Prof. Irving Fisher, “The Making of Index Numbers.” 


INDEX NUMBERS 


281 


The simple harmonic index number formula is 

n 

A 

Vk 

being the harmonic mean of the price relatives. 
The simple geometric index number is given by 


It is the geometric mean of the price relatives. 

The simple median index number is the median of the price 
relatives. 

The simple mode index number is the mode of price relatives, 
that one which occurs oftenest. 

The formula for simple aggregative index number is 

2pk 

2po 

It is the ratio of aggregate price of all the commodities for the 
fcth year to their aggregate price for the base year, multiplied by 
100 . 

The other 128 formulas are derived from these six by various 
systems of weighting, by “crossing” formulas, and by comparison 
with the value ratio 

ZpoQo 

11. Systems of Weighting.—Professor Fisher uses four systems 
of weights: 

I. po<?o. 

II. pog*. 

III. p k q Q . 

IV. p k q k . 

Other systems of weighting may easily be devised. 

Using I, the simple arithmetic formula becomes 




Fisher’s 3. 
II gives 


2p„?o ^ 

_Po 

2p 0 go 


It has just been seen to reduce to Fisher’s 53 


2po qg 
_Po 

2po?fc 


282 


A FIRST COURSE IN STATISTICAL METHOD 


Fisher’s 5. It reduces to 

2pkqk t 

Zpoqk 

Fisher’s 59 of the aggregate type. 


III gives 



2pkqo 

Fisher’s 7. 


IV gives 

V Pk 

2p k qk~ 

Po 


2p k q k 

Fisher’s 9. 



The application to other types of averages will not be recited 
here. 

Among so many index numbers confusion is naturally expected. 
They do not give the same results. Some, then, must be better 
than others in giving a fair and just comparison of the fluctuations 
in average price, or whatever the numbers are applied to. 

12. Three Tests.—There are certain tests that an index 
number should satisfy in order to be perfectly satisfactory. 
They are: 

The commodity reversal test. 

The time reversal test. 

The factor reversal test. 

13. The commodity reversal test is that the order of the 
commodities may be reversed, or, in fact, that they may be 
placed in any order whatever without making any difference in 
the result. This simply means that the superscripts on the 
letters 

P\ P", P"', P iv , • • • V M 

may be rearranged in any manner desired. It is evident that this 
test is satisfied by all the formulas, since the sum of the price 
relatives, the product of the price relatives, the sum of 
their reciprocals, their median, their mode, their aggregate are 
none of them affected by change in the order of their occurrence. 

14. The time reversal test interchanges the subscripts k and 0. 
That is, if, of two years, the ratio of the average price for the 


INDEX NUMBERS 


283 


second year to that of the first year is found, the interchange will 
give the ratio of the average price for the first year to that for 
the second year. The base year is simply interchanged with 
the other year, and the other year is made the base year. The 
resulting ratio (not index number) should be the reciprocal of 
the ratio obtained before the change was made. The product of 
the two should be unity. The product of the two resulting index 
numbers divided by 100 should be 100 per cent. This is evident, 
for, if with, say, 1900 as base, it is found that for 1905 the average 
price has doubled, the index for 1905 on 1900 as base is 200. 
Of course, if the average price has doubled from 1900 to 1905, 
then the 1900 average price ought to be half that for 1905 and the 
index for 1900 on 1905 as base would be 50. The product of the 
two ratios is 

Vi X M “ 1. 


The product of the two index numbers divided by 100 is 


200 X 50 
100 


100 . 


If there is any doubt in this argument in regard to going for¬ 
ward or backward in time, use index numbers of prices at one 
place as compared with prices at another place. Of two cities, 
A and B, there is no inherent reason for using one for the base 
rather than the other. Suppose the index of prices at A is obtained 
with the average price at B as base and found to be 200. That 
is, prices at A average twice as much as at B. Then prices at B 
must average half as much as at A. The index of prices at B, 
using average price at A as base, must be 50, if a satisfactory 
method of getting the index has been used. 

15. Factor Reversal Test.—Just as the commodity reversal 
test interchanges the superscripts of the letters, and the time 
reversal test interchanges the subscripts, so the factor reversal 
test interchanges the two factors p and q. An index number 
satisfying the commodity reversal test remains unchanged when 
the test is applied. This must be expected, because 2 p and 2 q 
each remain unchanged, no matter what the order of summation. 
1 he index number satisfying the time reversal test becomes its 
own reciprocal, when the test is applied. This is to be expected, 


^ X 2? = 1 

Po Pk 



284 


A FIRST COURSE IN STATISTICAL METHOD 


for each individual commodity. A fair index of average prices, 
then, should satisfy the same condition. 

The ratio of total values, 

2 p k q k j 

2po(7o 

is a fixed determinate constant no matter what sort of index 
numbers may be used. Now, 

Vk x Qk _ Pkqk 

Vo Qo PoQo 

for each individual commodity. So the ratio of average prices 
for two years multiplied by the ratio of average quantities pro¬ 
duced for the same two years ought to give the ratio of total 
values. An index number of price, properly weighted, gives the 
general average ratio of prices for the year k as compared with the 
year 0. Also, an index number of quantity produced, properly 
weighted, gives the general average ratio of quantities for the 
year k as compared with the year 0. If, in a formula for an index 
number of price weighted according to quantity, the letters p 
and q are interchanged, the result is the index number of quantity 
weighted according to price. The factor reversal test requires 
that the product of the two resulting index numbers, when p 
and q are interchanged, should equal the ratio of total values. 

Bear in mind that the formulas for index numbers and these 
three reversal tests apply not only to price and quantity but to 
any two sets of numbers the notation for which conforms to the 
notation used here. For example, p may be freight rates of a 
certain schedule of goods, p k for water-borne commerce, and Po 
for rail, while q k and q Q are the numbers of tons shipped of the 
same goods by water and by rail respectively. Then 2 p k q k is 
the total amount of freight paid for water-borne commerce and 
Xp 0 qo for rail. Or, p may be the number of rows of corn on an 
ear and q the number of kernels in a row (assuming the same num¬ 
ber in each row on any one ear). Let p k and q k be for Yellow 
Dent, and p 0 , for Country Gentleman. Let p', q' be for the 
shortest ear in a random sample of n ears, p", q" for the next 
longer, p'", q'" for the next, and so on to p (n) , q w for the longest 
of the set. In this case 2p k q k is the total amount of Yellow Dent 
corn and Zp 0 ?o is the total amount of Country Gentleman. 

Manifestly, all three reversal tests ought to apply to the cor, 
in the above example. For the same reason, they should apply to 

price and quantity. 


INDEX NUMBERS 


285 


Now examine these tests as they apply to some of the formulas 
already given. The commodity reversal test may be omitted 
from the discussion for obvious reasons previously given. 

16. Test of Average Price Relative.—Take the simple arith¬ 
metic mean of price relatives 

2 — 

Vo 

• - ■ - • 
n 


This formula has been used a great deal by different sources of 
index numbers. 

Applying the time reversal test, the result should be 



For simplicity take only two commodities. Then, 


~ / *v // ^ ^ It 

Vk_ , Pk_ V 0 ■ P 0 

~ / ■ _ // „ / I 


P 0 P 0 


X j 


Pk 


Pk 


II 



is obtained. 

The first two terms of the numerator are each 1. Neither of 
the other terms of the numerator equal 1 unless p " = cp' for each 
year. So unless the prices change proportionally, the product 
of the two index numbers cannot be expected to equal 1. 

Take the following simple example of two commodities, A 
and B. 


Prices 

Quantities 

1 

1900 

1905 


1900 

1905 

A 

20 

40 

A 

3 

6 

B 

100 

50 

B 

4 i 

2 






286 


A FIRST COURSE IN STATISTICAL METHOD 


The simple arithmetic index of prices of 1905 on those of 1900 
as base is 


X 


Vk 

Po 


40 . 50 


+ 


20 1 100 


+ 1 




125 per cent. 


n 2 2 

This makes the general average price of these commodities 25 
per cent higher in 1905 than in 1900. 

Applying the time reversal test and using 1905 as a base, the 
result would be 

100 


20 

40 ^ 


50 


| + 2 1 

2 = — 2 — = 1^ = 125 per cent. 

According to this, the general price of these commodities in 
1900 was 25 per cent higher than in 1905. The absurd conclusion 
is thus made that each year the average price was 25 per cent 
higher than it was the other year. Of course, this was an extreme 
case in which half of the commodities doubled in price and the 
rest decreased one-half. The algebra of the formula has shown, 
however, that the two indices cannot be expected to be recipro¬ 
cals of each other, unless prices change at a constant ratio. If 
this happened, there would be no need of index numbers anyway. 

It follows that the simple arithmetic mean of price relatives is 
a very poor form of index number. The factor reversal test 
would give 


X 


Qk 

Qo 


n 


'SCVk 

^Po 


n 


X 


V£* 

^qo 


n 


does not equal 


Zpicqk 

■ ■ ■ ■ ■ i i " ( 

2p 0 ?o 


and the factor reversal test is not satisfied. 

16. Tests of Other Formulas.—Consider Fisher’s formula 52 

(p. 280): 

Wk _ T.p k q k ^ Xpotfo. 

U'o Zqk ’ 2<7o 

For the time reversal, interchange the subscripts Jc and 0. 

This gives 

Zpogo T.p k q k _ Wo* 

2q 0 ‘ 2q k w k 


INDEX NUMBERS 


287 


It is seen, at once, that the second result is the reciprocal of the 
first. 

5x^=1. 

Wo W k 

The time reversal test is satisfied. 

For the factor reversal test, interchange letters p and q. 

This gives 

2qkPk 2< 7oPo 

The product of the two results, 

Zp*<?* ZqkVk 

'Zpk _ (Zpkqk) 2 Zpo • Zqo 

2pogo Zgopo (2po7o) 2 2pjt. 2 q k ' 

2go 2po 

is seen not to equal the ratio of values, 

2p*g» 

2po^o 

The factor reversal test is not satisfied. 

For the data on page 285, using 1900 as base, 

»• - %? - 


w k 42.5 


= 0.65. 


w 0 65.7 

Applying factor reversal, 

W J = = 6 X 40 + 2 X 50 _ 

2p* 40 + 50 _ 

wo' = „ 3 X 20 + 4 X 100 _ 

2p 0 20 + 100 

tV _ 3.78 _ 

Wo' 3.83 " ° 99 * 

The product of the two results is 

0.65 X 0.99 = 0.64. 

This is considerably smaller than the total value ratio, 

= 340 

2p 0 ?o 460 ' 

The time reversal test applied to Fisher’s 3 in the form of 53, 

Sp*go 

Zpoqo 


288 


A FIRST COURSE IN STATISTICAL METHOD 


(p. 280), gives the following results. With 1900 as base the index 
number is 


40 X 3 + 50 X 4 
20 X 3 + 100 X 4 


320 

460 


0.696 = 70 per cent. 


With 1905 as base, 


20 X 6 + 100 X 2 
40 X 6 + 50 X 2 


320 

340 


= 0.941 = 94 per cent. 


These results say that the average price for 1905 is 70 per cent 
of that for 1900, and that the average price for 1900 is 94 per cent 
of that for 1905. Or, in each year the prices are lower than they 
were the other year, which is absurd. The time reversal test is 
not satisfied. 

By examining index number formulas, it will be found that a 
number of those in common use will satisfy the time reversal test, 
while but few satisfy the factor reversal test. The problem, then, 
is to find a formula that satisfies all tests and is not too complex 
for practical use. 

17. Bias.—Many index numbers have what is known as bias; 
that is, they consistently give results too large, or else too small, 
to be a fair representation of the facts. The simple arithmetic 
mean of price relatives has an upward bias. 1 The simple 
harmonic mean has a downward bias. Also each system of 
weighting has its bias. Weighting each price relative by the 
value of the commodity in the base year p 0 ?o gives a downward 
bias. Weighting each price relative by the value of the com¬ 
modity for a given year p*?* gives an upward bias. 

18. Crossing.—Fisher’s 3 (p. 280) is an arithmetic type 

weighted by potfo. Fisher’s 19, 




is a harmonic type weighted by p*tf*. Professor Fisher gives goo 
reasons for using an average of these two. The geometric 
mean seems to be the fairest average to use, lying between t e 
arithmetic mean and the harmonic mean. Taking the average 
of two index number formulas is called crossing them. I 
i For a general discussion of bias see Prof. Irving Fisher, “Making o 
Index Numbers.” 


INDEX NUMBERS 


289 


geometric cross, between Fisher’s 3 and 19, is called by him the 
“Ideal” index number. It is 


This reduces to 

l2jhf/o x I'M*. 

V IpoQo ZpoQk 

In this latter form the first fraction is ratio of values, the value 
of each separate commodity being weighted by the quantity of 
that commodity produced in the base year. The second fraction 
is the same, except that the weights are the quantities produced 
in year k. The first fraction is Fisher’s 53, and the second, 
Fisher’s 59, two separate index number formulas. 

19. Fisher’s “Ideal” Satisfies the Tests.—This formula, 
Fisher’s Ideal, satisfies all three tests. That it satisfies the time 
reversal test may be seen by putting k for every 0, and 0 for 
every k in the formula and multiplying the result by the original 
formula. 


SPog* x o\ / 2 Mo x 2p*(f*\ = 

^PkQk 'Lpk^a) 

The quantities under the radical cancel out, giving Vl = 1, 
and the test is satisfied. 

To see that the factor reversal test is satisfied, write p for each 
q, and q for each p. Multiply the result by the original formula 
and see if it gives the ratio of total values. This process gives 

2 M o x 2p*0*\ = _ 2 M * 

2?opo ZqopJ \Xp 0 q Q 2 p Q q k ) \(2p 0 Jo)* 2JpogV 

the ratio of total values. So the test is satisfied. 

20. Cross-weighting.—Another very good formula is obtained 

y weighting each p by the arithmetic cross of the weights o 0 
and?*. This gives 

y go 4- ? * 

2 Vk 

~ --—' ■ 1 • 

X qo + Qk 
2 Po 






290 


A FIRST COURSE 7A r STATISTICAL METHOD 


The common factor to every term of the summations cancels 
out giving 

2 (go + Qk)pk 
2 (go + g*)po 

The virtue of this formula is that it is easier to compute than 
the Ideal, and is found empirically to differ from it but little in 
results. 

21. Arbitrary Weights.—Sometimes information as to quanti¬ 
ties produced is difficult to obtain. The U. S. census reports 
give this information for every 10 years, but not for every year 
or oftener. Arbitrary weights may then be assumed, based on 
estimates. An example of such index number formula is 

2wp k 

• • 

This is Fisher’s 9,051. It is of the aggregative type, the price 
of each commodity having an arbitrary constant weight. It is 
the same as Fisher’s 53, except that arbitrary weights are used 
instead of quantities produced in the base year. 

22. A Combination That Facilitates Computation.—Professor 
Fisher uses a combination of 53 and 9,051 for a weekly index of 
200 commodities. With the 28 most important of the 200 com¬ 
modities, great care is used in determining the q 0 for each one. 
For the other 172 commodities, powers of 10 are chosen as arbi¬ 
trary weights, w, of each. The power of 10 used for each com¬ 
modity is that nearest the actual statistical value of the quantity 
of that commodity. Some of these 172 commodities have the 
same weight. Their prices may be added and the decimal point 
placed according to what power of 10 is used as weight. For 
each of the 28, a separate product p k q o must be obtained. The 
result is said to be quite close to that obtained by 53 alone. 

23. Computation.—Now calculate Fisher’s 53, 2p*go/2pogo, 
for the data of Tables LXXX and LXXXI. The computations 

are shown in Table LXXXIII. 

The grains are placed in column to facilitate the additions. 
The column headings are self-explanatory. The figures of the 
product columns are given to the nearest 100. This will assure 
accuracy in the results to the nearest 1 per cent. 

According to these results the average price of the seven grains 
listed was, in 1917, 99 per cent greater than in 1913. In 1921 the 
average price was 77 per cent of that of 1913. 


INDEX NUMBERS 


291 


Table LXXXIII.— Index Numbers for Prices of Grains in the United 
States for Years 1917 and 1921 on 1913 as Base Year 1 

t? _u ^M°i 


Formula, 


2p 0 <7o 



Com. 69.1 2,447 


Wheat. 79.9 

Oats. 39.2 

Rye. 63.4 

Barley. 53.7 

Buckwheat. 75.5 

Rice. 85.8 


Total 


763 

1,112 

41 

178 

14 

26 



PoQo 

100 


1,691 

610 

435 

26 

95 

11 

22 


2,890 


1917 



127.9 

200.8 

66.6 

166.0 

113.7 

160.0 

189.6 



PiQo 


100 


3,129 

1,531 

741 

68 

202 

22 

49 


5,742 



42.3 

92.6 

30.2 

69.7 
41.9 

81.2 
95.2 



P 2?0 


100 , 


1,035 
707 
336 
29 
75 
11 
25 


2,218 


1 Computations by slide rule 

Index, 

SS X100 = m m > x 100 = 199 - S? x 100 = 77 - 

Table LXXXIV shows the computations for Fisher’s Ideal, 
for 1921 on 1913 as base year. 

Table LXXXIV.— Index Number of Prices of Grains in the United 

States for 1921 on 1913 Base 1 

Fisher’s Ideal, x 

__ \Spo<7o 2p 0 g* 


1913 


Grain 


PoQo 

100 


1921 


Pi 9i 


Corn . 69.12,447 1,691 

Wheat. 79.9 763 610 

0ats . 39.21,112 435 

R ye. 63.4 41 26 

Barle y. 53.7 178 95 

Buckwheat.75.5 14 11 

mce . 85.8 26 22 


42.3 

92.6 

30.2 

69.7 
41.9 

81.2 
95.2 



Pigi 

100 


1,298 

755 

325 

43 

65 

17 

36 


Pigo 

100 


1,035 

707 

336 

29 

75 

11 

25 


Mi 

100 


2,121 

651 

422 

39 

83 

11 

33 













































292 


A FIRST COURSE IN STATISTICAL METHOD 


Index 


100 . 


100 X 


2,218 2,539 

2,890 X 3,360 


The columns for 1913 are the same as in Table LXXXIII. 
For 1921 two additional columns are necessary, increasing the 
amount of computation considerably. The computation in the 
formula is also increased by one division, one multiplication, and 
a root extraction. The index number turns out to be 76, which is 
one less than by the other formula. Granting that 76 is more 
accurate, the question arises as to whether the correction of the 
error is large enough to compensate for the extra labor. Quite 
likely the error is no greater than errors in the original data. 

The computer can easily make his own table of computations 
for any index number formula that he desires to use, or for any 
combination of formulas. 

24. Graphs are Valuable.—It is illuminating to graph a series 
of index numbers over a period of time and extend it periodically, 
keeping the graph up to date. 

25. Warning as to Use of Index Numbers.—Index numbers 
and their graphs are becoming more and more popular and 
more diversified in their uses. It is, therefore, important to 
have an understanding of the principles involved in their 
making. One should have information in regard to the data 
used in forming the index number and the formula used in 
computing it. 

One should not apply a wholesale price index to retail prices. 
A price index based on one set of commodities will likely vary 
from that based on another set of commodities. Index numbers 
on prices in one market are likely to vary from those on prices 
from another market. If the same set of commodities is used, but 
with different weights, the resulting index numbers should not 
be compared except with caution. If two index numbers are 
derived from different bases, the numbers may not be compared 
without some means of reducing both to the same base. The 
common method used, if it is desired to reduce index numbers of 
different bases to a single base, say 1913, is to divide each index 

of any series by the 1913 index of that series and reduce to per cent. 

This leads to error in some formulas. For example, the Annal¬ 
ist’s index of wholesale food prices is based on the average of the 
decade 1890 to 1899. The index for 1913 is 139.980; for Aug. 2. 


INDEX NUMBERS 


293 


1924, it is 184.798. To reduce this latter to 1913 as base, divide 
184.798 by 139.980. This gives 132.017. Index numbers 
should not be compared when derived from different formulas 
without making allowance for the bias of the formula. 

26. An Index Number is a Type Form.—An index number, 
like any other average, is only a type form. It is the result 
of an effort to represent somewhat complex and varying data by 
a single representative type number. While some prices are 
going up and others going down, while some quantities are 
increasing and others diminishing, the index cannot be a definite 
price, but is merely a pointer to show the general trend. 

27. Selection of Commodities Used is Important.—One of the 
most important prerequisites for a good index number is the 
selection of the set of commodities used. A large number of 
commodities is not so important as the selection of a representa¬ 
tive group for the particular purpose at hand. Empirically, it is 
found that the particular formulas used in most cases differ but 
little in results. There are some inherently bad formulas, such 
as the average price relative. The set of weights used is not 
found to make so much difference in practice as do the commodi¬ 
ties selected. 

28. Value of Statistical Method.—With index numbers, as 
with all other statistical work, reliable results, properly inter¬ 
preted, are to be expected only from those persons who have 
made themselves familiar with the principles on which statistical 
method depends. At least an elementary knowledge of the 

mathematical theory involved is essential to trustworthy results 
and interpretations. 


Exercises 

Production of Food Crops in the United States in Millions 


of Bushels 


Year 

Wheat 

Corn 

Oats 

Barley 

Potatoes 

Sweet 

potatoes 

1913 

763 

2,447 

1,122 

178 

332 

59 

1915 

1,025 

2,995 

1,549 

229 

360 

77 

1917 

637 

3,065 

1,593 

212 

442 

84 



294 A FIRST COURSE IN STATISTICAL METHOD 


Average Prices Received by Producers in Cents per Bushel 


Year 

Wheat 

Corn 

Oats 

Barley 

Potatoes 

Sweet 

potatoes 

1913 

80 

69 

39 

54 

69 

7b 

1915 

93 

66 

45 

54 

50 

79 

1917 

162 

90 

51 

87 

149 

90 


1. Using the table of food prices of the six food crops given above, find the 
average price relative for each year on 1913 as a base year. Apply the time 
reversal test and see whether the results are consistent with the index 
numbers found. 

2. Using the table of prices and the table of amounts produced, as given 
above, find an index number for 1915 and for 1917 on 1913 as base, by means 
of Fisher’s 52, w k /w 0 . 

3. Using the same tables, find an index for 1915 and one for 1917 on 1913 
as base, by means of Fisher’s 53, 2p*^o/2po(?o. 

4. Using the same tables find Fisher’s Ideal index for 1915 and for 1917 
using 1913 as base. 

An abundance of material for problems may be obtained from the large 
newspaper almanacs. “The Statistical Abstract of the United States” also 
affords much material. Professor Irving Fisher’s “The Making of Index 
Numbers” has a list of prices and quantities of 36 commodities which he 
used for his illustrations. Selections from these lists afford very good 
examples. 





APPENDIX A 


LOGARITHMS 

A logarithm is an exponent. The laws of logarithms are the 
laws of exponents as developed in algebra. 

1. Definition.—The logarithm of a number N is that exponent 
which, operating on a certain specified number, will produce the 
given number N. The specified number upon which the expo¬ 
nent operates is called the base of the system of logarithms. The 
base of the common or Briggs system of logarithms is 10. The 
symbol commonly used for logarithm is log. The base is indi¬ 
cated as a subscript to the symbol “log.” Thus, logio 1,000 = 3 
is read: The logarithm of 1,000 to the base 10 is 3. The log¬ 
arithm of 1,000 to the base 10 equals 3, since 10 operated on by 
the exponent 3 gives 1,000. Thus 

10 3 = 1 , 000 . 

In general, 

log 10 N = x 
and 

10- = N 

are equivalent equations stating, in different form, the same rela¬ 
tionship between x, N, and 10. 

2. Laws of logarithms.—If, now, 

N = 10- 
and 

M = 10*, 

then, by definition of logarithm, 

logio N = x 
and 

log 10 M = y. 

Moreover, by the laws of exponents, 

10- X 10 v = 10-+*. 

Therefore, 

NM = 10-+* 

logio NM = x -f y, 

295 


and 



296 


A FIRST COURSE IN STATISTICAL METHOD 


the exponent of 10 to give NM. 

But 

x = logio N 
and 

y = logio M; 

therefore, 

logio NM = logio N + logio M. 

This last equation translated into English reads: 

(1) The logarithm of the product of two numbers is the sum uj 
their logarithms. 

Similarly, 

N + M = 10* ^ 10* 

= 10 *-*. 

Then, 

i N 

logio M = X ~ y ’ 


or 



logic N - logio M. 


This translated into English is: 

(2) The logarithm of the quotient of two numbers is the loga¬ 
rithm of the dividend minus the logarithm of the divisor. 

This same law may be stated: 

The logarithm of a fraction is the logarithm of the numerator 
minus the logarithm of the denominator. 

Further, since, by the laws of exponents, 

N z = (10*)* 

= 10 ”, 

it follows that 

logio N z = xz 


or 

logio N z = z log N. 

Translated into English this equation reads: 

(3) The logarithm of a power of a number is the index of the 

power times the logarithm of the number. # . 

3. Negative, Fractional, and Zero Exponents—It is proved in 

algebra that the laws of exponents hold whether the exponen 
are positive or negative, whole numbers or fractions, if P r( ^ 
interpretation is placed in the various kinds of exponents, 
following interpretations should be recalled: 


LOGARITHMS 


297 



(a) 

(« 


V _ 

Nv — ^N p . ( c ) 

From these interpretations and the laws of logarithms just 
derived 

log 10 1 = 0, 

because 


and 


10° = 1, 

(a) 

logio Jf = log .0 N~ l 

(6) 

= - logio N, 

(3) 

logio Y* = ~ x lo 8»° N > 

(6) and (3) 

_ V 

;io VN P = logio 

(c) 

= J log N. 

(3) 


For example 


logio Ho = - logio 20, 
logio \/20 = Yi logio 20, 
logic \/20 2 = 2 A logic 20. 

4. Base 10.—Since counting goes by powers of 10, the con¬ 
venient base to use for computation purposes is 10. In ordinary 
computations where 10 is understood to be the base, it is unnec¬ 
essary to write the base and only the symbol “log” is used. 

The values of the following logarithms to the base 10 are seen 
at once by inspection: 


log 10,000 = 4, 
log 1,000 = 3, 
log 100 = 2, 
log 10 = 1, 
log 1=0, 
log 0.1 = -1, 
log 0.01 = -2, 
log 0.001 = -3. 

6. Mantissa and Characteristic.— Now log 20 must be more 
tnan 1 and less than 2, since 20 lies between 10 and 100. In 


298 A FIRST COURSE IN STATISTICAL METHOD 

other words log 20 is 1 and a fraction. Log N is not an integer 
unless N is an integral power of 10. If N is not an integral power 
of 10, then log N is fractional. The decimal part of the value of 
a logarithm is called the mantissa of the logarithm. The part 
before the decimal point is called the characteristic. Tables of 
logarithms usually give mantissas only. Looking in a six-place 
table of logarithms the mantissa for 20 is found to be 301,030. 


Now write 

log 20 = 1.301030. 


Since 

200 = 10 X 20, 
log 200 = log 10 + log 20. 

a) 


= 1 + 1.301030, 


or, 

log 200 = 2.301030. 


Similarly, 

log 2,000 = 3.301030. 


Also, since 

2 = 20 -r- 10, 

log 2 = log 20 — log 10 

(2) 


= 0.301030. 


Moreover, 

0.2 = 2 -r- 10, 


so 

log 0.2 = log 2 — log 10 



= 0.301030 - 1. 

= 9.301030 - 10 
= -0.698970. 



The second form, 9.30103 — 10, is usually the most convenient 
to use in making computations. It is sometimes written 1.301030. 
The minus sign being placed over the 1 indicates that it belongs 
with the 1 alone and not with 301030. This method of indi¬ 
cating a negative logarithm causes inconvenience when several 
logarithms are to be added or subtracted and some of them are 
negative. 

Further, since 

0.02 = 0.2 10, 
log 0.02 = log 0.2 — log 10 
= 0.301030 - 2 
= 8.301030 - 1 0. 



LOGARITHMS 


299 


• Similarly, 

log 0.002 = 7.301030 - 10. 

Notice that the mantissa is the same, 301,030, for 2,000, 200, 20, 
2, 0.2, 0.02, and 0.002. The number 2 multiplied by any inte¬ 
gral power of 10 must have the mantissa 301,030. The character¬ 
istic will be the index of the power of 10 used. 

Thus: 

2,000 = 2 X 10 3 
200 = 2 X 10 2 , 

20 = 2 X 10 l , 

2 = 2 X 10°, 

0.2 = 2 X 10- 1 , 

0.02 = 2 X 10- 2 , 

0.002 = 2 X 10- 3 , 

and so on. The characteristic of the logarithm of 

2,000 is 3, 

200 is 2, 

20 is 1, 

2 is 0, 

0.2 is —1, 

0.02 is -2, 

0.002 is -3, 

and so on. Consideration of the above figures will show that, 
necessarily, the characteristic of the logarithm of a number greater 
than 1 will be one less than the number of digits to the left of the 
decimal point. The characteristic of the logarithm of a number 
less than 1 will be negative, and numerically one greater than the 

number of zeros between the decimal point and the left-hand 
figure of the number. 

6. Rule for the Characteristic.—These two statements may be 

combined into one rule for determining the characteristic of the 
logarithm of a number. 

To determine the characteristic of the logarithm of a number, 
begin at units place and count to that figure, not zero, which is 
farthest to the left. Count, not 1, 2, 3, etc., but 0, 1, 2, 3, etc. 
ihe count upon arriving at the left-hand figure is the character¬ 
istic of the logarithm. It is positive if the counting proceeded 
the left and negative if the counting proceeded to the right. 


300 A FIRST COURSE IN STATISTICAL METHOD 

Applying this rule, the characteristic of the logarithm of 

4,763.21 is 3, 

160.00 is 2, 

39.98 is 1, 

3.108 is 0, 

0.4600 is -1 = 9 - 10, 

0.0460 is -2 = 8 - 10, 

0.0046 is -3 = 7 - 10, 

and so on. 

7. Mantissa Independent of the Decimal Point.—Since moving 
the decimal point in any number merely multiplies or divides the 
number by an integral power of 10, the logarithm of the number 
will be changed only by the index of the power of 10 used. That 
is, since moving the decimal point of a number two places to the 
right multiplies by 10 2 , the logarithm of the number is increased 
by 2. Or, since moving the decimal point of a number three 
places to the left divides by 10 3 , the logarithm of the number is 
diminished by 3. It follows that the mantissa for any sequence of 
figures remains the same no matter where the decimal point is 
placed. 

Characteristics are determined by the position of the decimal 
point. Mantissas are determined by the sequence of figures. For 
this reason, tables of logarithms need tabulate mantissas only. 

Tables of mantissas are commonly printed to 4, 5, 6, 7, or 8 
figures. The greater the number of figures given the greater the 
degree of accuracy of computation attainable in their use. In 
general, n-place tables give about n-place accuracy in the 
computations. 

8. Use of the Table.—On pages 349 to 352 is printed a four- 
place table. To find the logarithm of 467.8, proceed as follows: 
Write down the characteristic which is seen by inspection to be 
2. Follow down the N column of the table to the line where 
occurs the first two figures, 46, of the number. Follow to the 
right on this line to the column whose heading is 7, the third figure 
of the number. Here is found 6,693. This is the mantissa for 
the sequence of figures 4,670. On the same line in the next 
column is found 6,702. This is the mantissa for the sequence of 
figures 4,680. The difference between these two mantissas is 

6,702 - 6,693 = 9. 

Thus for an increase in the number of 

4,680 - 4,670 = 10, 


LOGARITHMS 


301 


the mantissa increases 9. Our sequence of figures, 4,678, is 8 
more than 4,670. If an increase of 10 in the number causes an 
increase of 9 in the mantissa, an increase of 8 in the number will 
give an increase of %o of 9 in the mantissa, or 7.2. This 
increase is nearer to 7 than 8; so increase the mantissa by 7, 
giving 

6,693 + 7 = 6,700 

for the mantissa of the sequence 4,678. 

The characteristic 2 has already been determined and 
written down. Now annex the mantissa, giving 

log 467.8 = 2.6700. 

9. Interpolation.—Finding the mantissa for a number between 
two numbers as given in the table is called interpolation. Inter¬ 



polation by the above method assumes uniform variation from 
one mantissa of the table to the next. It is illustrated in Fig. 66. 

Numbers are laid off on the horizontal scale and mantissas 
on the vertical. FA = 6,693 is the mantissa for 4,670, and HE 

— 6,702 is the mantissa for 4,680. GC , the mantissa for 4,678, is 
to be found. To do this, find 

x = BC, 

the amount to be added to 

GB = FA = 6,693. 

DE = HE - FA = 6,702 - 6,693 = 9. 




302 


A FIRST COURSE IN STATISTICAL METHOD 


From similar triangles 

x _ AB 

de~ad’ 

or 

= 4,678 - 4,670 
4,680 - 4,670 

= Ho X 9 
= 7.2. 

Then, 

GC = 6,693 + 7 = 6,700, 
the required mantissa. 

10. The Inverse Process.—If a logarithm is given as 

log N = 9.4823 - 10, 

find N, the number it is the logarithm of, by use of the table. 
This number N is called the antilogarithm and is written thus: 


N = antilog 9.4823 - 10 
or 

N = log" 1 9.4823 - 10. 

To find N, take the mantissa, 4,823, and find the two nearest 
mantissas, one less and the other greater than 4,823, namely, 
4,814 and 4,829; 4,814 is seen to be the mantissa for 3,030; 4,829 
is the mantissa for 3,040. The difference of these mantissas is 

4,829 - 4,814 = 15. 

The difference between the given mantissa and the smaller 
of the two from the table is 

4,823 - 4,814 = 9. 

For a change of 15 in mantissas, there is here a change of 10 in 
the corresponding number sequence. Then, for a change of 9 in 
mantissas, there will be a change in number sequence of 

Ho of 10 = 6. 

The correct number sequence to four figures is, then, 

3,030 + 6 = 3,036. 

Since the characteristic of the given logarithm is 9 - 10 = 

-1, the decimal point must be immediately preceding this 

number sequence and 

N = 0.3036 = log" 1 9.4823 - 10. 


LOGARITHMS 


303 


This process of interpolation is the inverse of that used before. 
Now GC is given, and FA and HE are found from the table. FII 
= 10 and K? = AB is to be found. 

11. Other Bases.—In order to have a system of logarithms, it is 
not necessary that the base be 10. If 2 is taken as base, then 


since 

Since 



log 2 8 = 3, 


2 3 = 8 . 

9 3 = 729 = 27 2 , 
log 9 729 = 3, 

log 2 7 729 = 2. 


It is evident that there is some relation between the logarithm 
of a number to one base and its logarithm to another base. To 
find the relation between logo N and log*, N, let 


and 

Then 


Since 


Now 

and 

So 


or 


log a N = x, 
logb N = y t 

log a b = c. 

N = a* 

N = b\ 

b = a G . 

b = a c 
bv = a cv . 

a* = = N, 

a s = a e v. 



log« N = log a 6 • log 6 N, 


log& N = 


lOga N 
l0ga b ' 


whence 


304 


A FIRST COURSE IN STATISTICAL METHOD 


From this relation, if a table of logarithms to the base a is 
given, it is possible at once to get logarithms to the base b. Take 
N = a and substitute a for N in the above relation; then 


log 6 a = 

Since log a a must equal 1, this gives 


logo a 
log a b 


log b a = 


loga b 

That is, log6 a and log a b are reciprocals of each other. 
So that 


log 6 N = loga N • logb a. 

12. Natural or Napierian Logarithms.—Besides the common or 
Briggsian system to the base 10, there is used in theoretical work 
another system known as the natural, hyperbolic, or Napierian 
system of logarithms. The base of this system is an incom¬ 
mensurable number indicated by the letter e. To eight figures 

e = 2.7182818. 


Tables of logarithms are derived by means of certain infinite 
series which give logarithms to the base e. From logarithms to 
the base e, logarithms to the base 10 may be found by the relation 
just determined. 

log 10 N = log, N • logio e. 
logio e = logic 2.71828 = 0.434294. 

Then 

logio N = 0.434294 log, N. 

13. Modulus.—This number, 0.434294, is known as the 
modulus of the common system. Its reciprocal, log, 10 = 
2.30259, is the modulus of the natural system. 

Having natural logarithms, common logarithms are obtained 
by multiplying by the modulus of the common system. Having 
common logarithms, natural logarithms may be obtained by 
multiplying by the modulus of the natural system. For example, 
from tables of natural logarithms, 

log, 5 = 1.60944. 

Then 

logio 5 = 0.434294 X 1.60944 
= 0.698970. 


From tables of common logarithms, 

logio 5 = 0.698970 


LOGARITHMS 


305 


Then 


log* 5 = 2.30259 X 0.G98970 
= 1.60944. 


14. Cologarithm.—The logarithm of the reciprocal of a number 
is called the cobgarithin of the number. Thus 

log = colog N. 


From this equation, since 


1 


log ^ = log 1 - log N 


it follows that 


= 0 — log N, 


colog N = —log N. 


Therefore, whenever in computations log N is to be sub¬ 
tracted, it serves the same purpose to add colog N. To find 
colog N, subtract log N from 0 in the form of 10 — 10. Thus 


log 20 = 1.3010. 

To get colog 20, subtract as follows: 


10.0000 - 10 
1.3010 

colog 20 8.6990 — 10. 
log 0.0341 = 8.5328 - 10. 

To get colog 0.0341 subtract from 10 — 10, thus: 

10.0000 - 10 
8.5328 - 10 

colog 0.0341 = 1.4672 

It is a simple matter to write down colog N from log N itself. 
Begin at the characteristic and subtract each figure from 9 except 
the figure farthest to the right that is not 0. Subtract this last 
figure from 10. Write down the results of subtraction as you 
proceed. If log N is positive, —10 must be written after the 
result of the subtractions. If log N has —10 written as part of 
it, there is no —10 with colog N. 


306 


A FIRST COURSE IN STATISTICAL METHOD 


15. Examples.—A few illustrative examples of logarithmic 
computation are given below: 

x = 245.6 X 1.391. 

log x = log 245.6 + log 1.391. (1) 

log 245.6 = 2.3902 
log 1.391 = 0.1433 _ 

log x = 2.5335 

x = 341.6 



x = 2.492 -r- 0.1265. 

log x = log 2.492 — log 0.1265. (2) 

log 2.492 = 0.3965 
l og 0.1265 = 9.1021 - 10 

log x = 1.2944 
x = 19.70 


The table gives the mantissa for 1,970 as 2,945; 2,944 is so close 
to 2,945 that the interpolation gives a sequence of figures closer 
to 1,970 than to 1,969. 

This division may be performed by the equation: 

log x = log 2.492 + colog 0.1265. 
log 2.492 = 0.3965 
colog 0.1265 = 0.8979 

log x = 1.2944 
x = 19.70 



0.0345\/64 .8 j 
0.00374*^9.34 


log x = log 0.0345 + Y 2 log 64.8 + colog 0.00374 + H colog 

9.34. 


log 0.0345 = 8.5378 - 10 
y 2 log 64.8 = 0.9058 
colog 0 00374 = 2.4271 
H colog 9.34 = 9.6766 - 10 

log x = 1.5473 
x = 35.3. 

Facility in the use of logarithms is attained only by practice. 


APPENDIX B 


PERMUTATIONS AND COMBINATIONS. BINOMIAL 

EXPANSION 

If a group of r things is selected from a set of n things, this 
group of r things is called a combination. The question at once 
arises as to how many different groups of r things it is possible to 
select from n things. In other words, it is desired to know the 
number of combinations of n things taken r at a time. The symbol 
for this number of combinations is n C r . 

In any set or combination of r things, the things may be arranged 
in different orders. Each arrangement of the r things is called a 
'permutation of the r things. The symbol for the number of permu¬ 
tations of n things taken r in each set is „P r . 

From the four letters a, b, c, d, may be selected the following 
sets of three each, abc, abd, acd , and bed. There being four and 
only four of these possible sets, 4 C 3 = 4. 

In any one of these sets or combinations of letters, such as abc , 
for example, the three letters may take any one of the following 
arrangements, abc , acb, bac, bca , cab , or eba, six in number. Each 
of these arrangements is a permutation of the three letters. The 
three letters in each of the four combinations have six permuta¬ 
tions. Then for all four combinations there are 4 X G = 24 
different permutations. So 4 P 3 = 24. To find the value of 
r»C r and nP r , first consider the following theorem. 

1. Fundamental Theorem .—If there are p ways of performing 
one act and q ways of performing an act associated with it, then there 
are pq ways in which the two associated acts may be performed. 

This is shown by the following illustration. If there are five 
roads from New York to Chicago and four roads from Chicago to 
Seattle, then there are 5 X 4 = 20 routes from New York to 
Seattle by way of Chicago. This is true since there are five 
choices of road from New York to Chicago, and with any one of 
these five may be associated any one of four roads from Chicago 
to Seattle, making 5X4 ways to travel from New York to Seattle 
by way of Chicago. Similarly, if there are p roads from New York 

307 



308 


A FIRST COURSE IN STATISTICAL METHOD 


to Chicago and q roads from Chicago to Seattle, there are p 
choices of routes from New York to Chicago, and with the one 
chosen may be associated any one of q roads to finish the trip, 
making pq choices of route from New York to Seattle by way of 
Chicago. 

If, in addition, there are three roads from Seattle to San 
Francisco, then any one of the 20 ways from New York to 
Seattle may be associated with any of the three ways to San 
Francisco, resulting in 20 X3 = 5X4X3= 60 choices of 
route from New York to San Francisco by way of Chicago and 
Seattle. 

2. Value of nP r .—In order to find the value of JPr, suppose 
there are r compartments to be filled by selections of r things 
from n things, placing one thing in each compartment. (See 
figure.) 

Ill, t -j_| 

1st. 2nd. 3rd 4 th rth 

Compartments 


The first compartment may be filled with any one of the n 
things. That is, there are n choices of the thing with which to 
fill the first compartment. When this compartment is filled, 
there are n — 1 things left from which to select one to fill the 
next compartment. So, by the theorem just stated, the number 
of ways in which the first and second compartments may be filled 
is n(n — 1). With two compartments filled, there are n - 2 
things left from which to select one to fill the third compartment. 
The three compartments may then be filled in n(n — l)(n — 2) 
ways. Similarly, the first four may be filled in n(n — l)(n —2) 
(n — 3) ways. It is at once seen that the number of ways the 
r compartments may be filled is the continued product of n(n —1) 
(n — 2) . . . (n — r -f 1). The number of ways in which the 
r compartments may be filled is JP r . So n P r = n(n — l)(n — 2) 

. . . (n — r - f- 1). It follows at once that the number of per¬ 
mutations of n things taken all, or n, at a time is n(n — — 

. . .3-2*1. This product of the natural numbers from 1 to 
n inclusive is called factorial n. The symbol for factorial n is n ! 


5 = l- 2- 3- 4- 5 = 120. 


or n. 


nPn = n. 


Then, 



r 


PERMUTATIONS AND COMBINATIONS 


300 


In the product n{n 
are all of the factors in 
n — r 


- 1 )(n - 2) 
I™ except 1 


(n - 


2-3 


r + 1) there 
(n - r) = 


Therefore, 


nPr = 


\n 


n — r 


From this formula, 


ii 


1 


as was previously stated. 

3. Value of n C r .—If a combination of r things be selected, 
these r things have |r permutations among themselves, since 
rPr = fc. Each of the combinations, then, of n things taken r 
at a time, has |r permutations among the things in the combina¬ 
tion. It follows, at once, that 

„C r • \r = nPr- 

Therefore, 

p 

r — ^_ r 

— j r » 


or 


n Cr = 


n 


From this formula 


n — r \r 


iC 3 — 


- -A = 4. 


1 |3 


as was previously obtained. 

If, from a set of n things, all n of them are selected, there is, of 
course, only one such combination. That is „C» = 1. By the 

formula for number of combinations, n C n = t-—— r- m Since 

\n — n |n 

this must equal 1, the formula is consistent only by assuming 
7 LZP: = 1. That is, |0 = 1 . 

4. A Simple Relationship.— n CV = n C n _ r . This is seen by 
applying the formula to each member of the equation. 


nC r = 




n 


n — r r 

-! — 


71 


n 


n — {n — r) 


n — r 


\r n 



310 


A FIRST COURSE IN STATISTICAL METHOD 


This last result is the same as the first; therefore, n C r = „C^ r . 

This is an important theorem. 

5. Binomial Expansion.—When ( a -f- b) n is multiplied out, 
the resulting series is known as the binomial series and the proc¬ 
ess of getting the series is the binomial expansion. Such simple 
cases as 

(a + 6) 2 = a 2 + 2ab + 6 2 

and 

(a + 6) 3 = a 3 + 3a 2 6 + 3a6 2 + 6 3 

are well known in elementary algebra. In multiplying out 
( a -f b) n , the expression becomes (a + 6 ) (a + 6 ) (a + b) . . . 
to n factors. To write down the result, all combinations of prod¬ 
ucts that can be obtained by taking one letter from each factor 
must be obtained and all these products added together. Taking 
a from each factor gives a n . Taking a from each factor except 
the first gives a n ~ l b. Taking a from each factor except the 
second also gives a n ~ l b. Similarly, taking a from each factor 
but one gives a n ~ l b as many times as there are factors, or n times. 
The sum of these products is na n ~ l b. A simple manner of look¬ 
ing at this process is to consider it as taking a n-1 6 as many times 
as the number of times it is possible to select one thing from n 
things, that is n C\. This is true since 6 is selected from one 
factor in as many ways as there are factors. Next consider in 
how many ways a may be selected from all but two factors and 
6 be selected from the other two. This must be n Ci. Therefore, 
in the expansion will occur n C 2 a n_ 2 6 2 . 

Similarly, the number of ways a may be selected from all but 
three factors and 6 be selected from the remaining three is n C 3 , 
and the term n C 3 a n- 3 6 3 occurs in the expansion. For a selected 
from all but four factors and 6 from the remaining four, the term 
n C\a n ~ A b 4 occurs. This process will be continued to the point 
where a is selected from one factor and 6 from all the rest, giving 
the term n C„_ia 6 n_1 . Only one more term will follow, namely, 
that obtained by selecting 6 from each factor, giving the last 
term, 6 n . The terms that are thus obtained being added 
together gives the series: 

(a + 6)" = a" + uC&'-'b + «C 2 a"- 2 6 2 + n C 3 a"" 3 & 3 + 

+ . . . + n Cn-iab n - 1 + b n . 

Since n C r = n C n -r, it follows that the coefficient of the second 

term equals the coefficient of next to the last term. n C\ = n ^V*‘ 
Also the coefficient of the third term equals that of the third 


PERMUTATIONS AND COMBINATIONS 


311 


term from the end. n C 2 = n C„_ 2 . The same relation holds 
throughout the series. Terms equally distant from the beginning 
and end of the series have equal coefficients. Evidently, if there 
is an even number of terms, the coefficients will all be paired. 
If there is an odd number of terms, there will be a middle term 
whose coefficient is not paired. From the manner in which the 
right-hand subscripts run in the coefficients, it is evident that 
there are n + 1 terms in the expansion. This is also evident 
from the manner in which the exponents of b run, as well as those 
of a also. In the series as written above, it may be noticed that 
the right-hand subscript and the exponent of b are always the 
same. From the theorem in combinations just used, the right- 
hand subscript for the coefficient of any term may just as well 
be made the same as the exponent of a in that term. 

By writing down the values of these coefficients in terms of n, 
a law of formation may be found. 

| n - 1 |1 n * 

_ n(n — 1) _ n(n — 1), 

12 1-2 



nCx = 

nC n _, = 

nC 2 = n C 

n—2 — 

n 

n - 2 12 

c, = c , - - 

l n 

n(n 

'-'3 n^n—3 “ 

n — 3 | 

I ~ 

,C, = X = - 

|n 

n(n 

*• n'-' n— 4 — 

In — 4 

li 


dr — n C n _r = 


n 


n — r 


|3 1-2-3 

- l)(n — 2)(n — 3) _ 

11 

n(n — l)(n — 2)(n — 3) 

1-2-3-4 

_ n(n — l)(n — 2) . . . {n — r + l) t 


and so on. 

It may be seen, at once, that any coefficient may be obtained 
by multiplying the numerator of the preceding coefficient by a 
factor one less than the last factor in that numerator and the 
denominator by a factor one greater than the last factor in that 
denominator. Thus, 

f _ n n ^ 

nW — n ^ 2 K » 

and 


c4 = n C 3 


71 - 3 


and so on. 



312 


A FIRST COURSE IN STATISTICAL METHOD 


The binomial series written out with the coefficients in terms 
of n, gives: 

(a + b) n = a n -f- na n ~ l b -f- 71 ^ ^ a n ~ 2 b 2 


+ 


n{n 


— l)(n — 2) . , 

1-2 3 + 


4- nab n ~ l + b' 


6. Rules for Forming Terms. —From the rule given above for 
forming any coefficient from the preceding coefficient, and 
remembering that in any term the exponent of b equals the last 
factor in the denominator of the coefficient, and that the 
exponent of a is always n minus the exponent of b, the following 
rule may be stated for deriving any coefficient from the 
preceding coefficient: 

For the coefficient of any term, multiply the coefficient of the pre¬ 
ceding term by the exponent of a in that term and dinde by one 
more than the exponent of b. 

The exponent of b in the first term is 0, in the second term it is 
1, in the third term 2, and so on. In general, the exponent of b 
in any term is one less than the number of that term as it stands 
in the series. The exponent of b in the (r + l)th term is then 
r. The exponent of a is n — r. The last factor of the denomina¬ 
tor of the coefficient is r. The last factor of the numerator is 
n — r -f 1. Now write the (r + l)th term. It is 

n(n - 1) (ft - 2 ) ... (ft - r +_1) fln _ rfer 

|r 

The (r + l)th term may also be written as n C T a n ~ r b r . 

7. Pascal’s Triangle. —It is of interest to note that the coeffi¬ 
cients of the binomial expansion for any value of n may be 
developed in triangular form. 


1 


1 

1 

1 5 

6 


4 6 4 

10 10 
20 15 


ft 


The first row, 1,1, gives the coefficients of (a + b) l = a 
The second row, 1, 2, 1, gives the coefficients of (a + 0 ) 

a 2 + 2 ab -f b 2 , and so on. 




PERMUTATIONS AND COMBINATIONS 


313 


In this triangle, any number is equal to the sum of the two 
numbers just above it, one to the right and the other to the left. 
For example 6 in the fourth row equals 3 + 3, 10 in the fifth row 
equals 4 + 6, 1 in any row equals 1+0. To form additional 
rows start with 1 and proceed by adding the numbers two at a 
time, of the row last written. Thus the next row will be 1, 
1 + 6, 6 + 15, 15 + 20, 20 + 15, 15 + 6, G + 1, 1 + 0, or 1, 
7, 21, 35, 35, 21, 7, 1. This set forms the coefficients in the 
expansion of 

(a + b) 7 = a 7 + 7 a 6 6 + 21a s b 2 + 3 oa*b 3 + 35a 3 b 4 + 21a 2 6 5 + 

7ab* + b\ 


This triangular scheme is known as Pascal’s triangle. 

8. If n is Not a Positive Integer.—If n is not a positive integer, 
the laws for binomial expansion may still be followed, but there 
will be no end to the number of terms in the binomial series. For 
example, the law's of binomial expansion give 


(a + b)* = a H + 2 a ~ 




+ 


(K)(-X)(-X ) 


|3 


a~ b 3 + 


It is seen, at once, that the exponent of a will never become zero, 
permitting the expansion to stop. The infinite series resulting 
when n is not a positive integer have valuable applications when 
certain conditions of convergence are fulfilled. These conditions 
of convergence together with the proof of the binomial expansion 
when n is not a positive integer are in the province of higher 
algebra, and will not be presented here. It is sufficient to state 
that the conditions of convergence are satisfied in the expansion 
of (1 + x) n , if x is numerically less than one. 

For example, expanding (1 - *)-> by the binomial expansion 
gives 


(1 - .)-■ = l-i + { — 

= 1 + X + X 2 + X 3 + . . . 

This senes never ends, or is an infinite series. 
Since 


• • 


a - *)-> = _L_ 

1 — X 



314 


A FIRST COURSE IN STATISTICAL METHOD 


this same series may be obtained by actual division of 1 by 1 - x. 

In this series, it is evident that if x > 1, the sum of the series 
may be made as large as desired by carrying the expansion out 
to a sufficiently large number of terms. 

If x < 1, the series is convergent. By convergent is meant 
that the sum approaches a definite limit as the number of terms 
is made to increase without limit. It is the well-known geometric 
progression in which the constant ratio is x. 



APPENDIX C 


LAWS OF PROBABILITY 

If there are several equally likely events that may happen, the 
probability that a given one of these events will happen is the 
ratio of the number of cases favorable to its happening to the total 
number of possible cases. 

For example, if a bag contains four white balls and three black 
ones, all alike except for color, the probability of drawing at ran¬ 
dom a white ball is There are seven possible events, each the 
drawing of a ball. Since the balls are alike except for color, these 
events are equally likely. Of these seven events, four are 
favorable to drawing a white ball. By the definition, then, the 
probability of drawing a white ball is ^ 7 . 

This means that if a ball is drawn and a record is made of 
whether it is white or black, and then the ball is returned to the 
sack and another drawing is made, the color of the ball recorded, 
and the process repeated a great many times, a white ball may be 
expected to be drawn about four times out of seven. In other 
words, in the loDg run about £ 7 , 57 per cent, of the drawings 
are expected to give white balls. It is necessary, of course, that 

there be no greater likelihood of drawing any one ball than any 
other. 

If the event does not happen, it fails. In the above illustration 
the probability of not drawing a white ball is Jf. The probability 
of happening plus the probability of failure equals unity, since 
the number of favorable cases plus the number of unfavorable 
cases equals the total number of possible cases. 

Let p be the probability of happening and q be the probability 
of failure; then 

V + q = 1. 

Since the event must either happen or fail, the sum of p and q 
represents certainty. 

Theorem 1. The probability that all of a set of independent events 
will occur is the product of the separate probabilities of each of the 
single events . This is the compound probability of the simultar 

315 


316 


A FIRST COURSE IN STATISTICAL METHOD 


neous happening of all the events or of their happening in 
succession. 

p roo f _L e t Pl be the probability of the happening of the first 
event, P 2 that of the second, p 3 of the third, and so on to p n , the 
probability of the happening of the nth event. Let the number of 
possible cases in connection with the first event be mi and let ai of 
these be favorable to the happening of that event. Similarly, 
m 2 and a 2 are the number of possible and of favorable cases respec¬ 
tively to the happening of the second event, w 3 and a 3 for the 
third event, and so on to w n and a„ for the nth event. Then, by 


definition, 



a* 



Any one of the mi ways for the first event may be combined 
with any one of the m 2 ways for the second event, and any one of 
these combinations may be combined with any one of the m 3 
ways for the third event, and so on. Then by the first theorem 
proved in Appendix B, the number of possible ways for all events 
is the product tmnum, . . . m„. Any one of the a, favorable 
cases for the first event may be combined with any one of the a, 
favorable cases for the second event and any one of these combina¬ 
tions may be combined with any one of the a 3 , favorable cases 
for the third event, and so on. So, by the same theorem, the 
total number of favorable cases for all the events is the product 


The probability p of the simultaneous or successive happening 
of all the events is the ratio of the total number of favorable case? 
to the total number of possible cases. Therefore, 

aia 2 a 3 • • • Qn 
^ m\m 2 mz . . • m, n 


But this fraction equals the product 

ai ' a 2 _ a 3 . a n 

mi m 2 m3 

This product is the product of the separate probabilities of each 
of the single events. Therefore, 

p = ViViVi •••?»• 

The probability of failure of each of these events is 

(1 - pi), (1 - pO, (1 - Pa)> • • • 



LAWS OF PROBABILITY 


317 


Corollary I. —The probability of the failure of all n events is 
the continued product, 

(1 - pi)(l - p 2 ), (1 - Pi) . . . (1 - P»). 

Corollary II .—The probability of the happening of the first 
r of the n events and the failure of all the rest is 

P1P2P3 . . . Pr(l - Pr+l)(l - Pr +2 ) ... (I ~ Pn)- 

If one sack contains four white balls and three black ones, and 
another sack contains seven white balls and five black, and a ball 
is drawn from each sack, what is the probability that both balls 
will be white? 

Drawing a ball from one sack has nothing to do with what color 
will be drawn from the other sack. So the two drawings (events) 
are independent. 

a x 4 

P ‘ ~ m, ~ 7' 

Go 7 

V ' - m 2 ~ 12 * 

p = P1P2 = • 1/12 = H- 

So the probability that both balls will be white is Y z . This 
means that if a large number of pairs of drawings were performed, 

about one-third of them would be expected to yield two white 
balls. 

Theorem II.—The probability that any one whatever of a set of 
mutually exclusive events will occur and all the others fail is the sum 
of the partial probabilities of the single events. 

Proof. First make clear the meaning of partial probability. 
Suppose a sack contains 27 balls, of which 7 are white, 3 are black, 
8 are red, and 9 are green. Let 4 white balls be marked with 
an X, 2 black ones marked with an X, 3 red ones marked with an 
X, and 7 green ones marked with an X. 

In drawing a ball with an X on it, any one of four mutually 
exclusive events may happen, namely, drawing a white ball with 
an X, a black ball with an X, a red ball with an X, or a green ball 
with an X. If a ball is drawn at random from the sack, the prob¬ 
ability that some given one of these four events will happen is the 
partial probability of the happening of that particular event. 

To prove the theorem, suppose there are n possible happenings 
that ai is the number of cases favorable to the happening of 
the first of m mutually exclusive events, a 2 the number favorable 
to the happening of the second, and so on to a., the number 


318 


A FIRST COURSE IN STATISTICAL METHOD 


favorable to the happening of the mth of the mutually exclusive 
events. Also let pi, p 2 , • . . Pm be the probability of the hap¬ 
pening of these events, respectively. Then, by definition of 
probability, 





These are partial probabilities. Now, the number of cases 
favorable to the happening of any one whatever of the m events is 

cii + 02 + • • • + Om- 

Therefore, the probability of the happening of any event what¬ 
ever of the series is 

Oi + «2 + • • • + o m 

sr \ —— ----——-- • 



Ol 1 02 , 

p --1- r • 

r n n 


. Om 

• + ~n 


= Pi + P2 + • • • + P«J 

which was to be proved. The sum of the partial probabilities 
is called total probability. 

In the illustration of 27 balls in a sack, there are four mutually 


exclusive events. 



There are n- 27 possible happenings. The number favorable 
to the first event, the drawing of a white ball with an X, is ai - • 
The number favorable to the second event, drawing a black ball 
with an X , is o 2 = 2. The number favorable to the third event 
is o 3 = 3, and to the fourth event is o 4 = 7. The partial proba¬ 
bilities of each of the events are 

Pi = 

p 2 = %7> 

Pa = % 7 , 

Pa = %7, 


respectively. . , 

The probability of any event whatever of the set, that is, 

probability of drawing a ball with an X is 

p = 5^7 + ^7 + ^7 + = 1 %7- 

This result is evident since, of the 27 balls, 16 are marked with 
an X, and the probability of drawing a ball within an X is /2 • 



LAWS OF PROBABILITY 


319 


Theorem III—If the probability that an event will occur in 
a single trial is p, and the probability it will fail is q , then the probabil¬ 
ity that it will occur exactly r times, no more no less, in n trials is 

n C r p r q n_r . 

Proof ,—The probability the event will happen in a given trial 
is p. Its happening or not happening has nothing to do with its 
happening in another trial. The probability of its happening in 
another given trial is also p. Then, by Theorem I, the prob¬ 
ability of its happening in any two given trials is the compound 
probability p ■ p = p 2 . By the same theorem, the probability 
of its failing to happen in all the rest of the n trials is q n ~ 2 . Also, 
by the same theorem, the probability of the simultaneous occur¬ 
rence of happening in any two given trials and failing in all the 
rest is 

ptqn-2' 

Similarly, the probability of the simultaneous happening of 
the event in any three given trials and failure in all the rest is 

piqn~\ 

In general, the probability of the happening of the event in 
any r given trials and failure in all the rest is 

p r q n ~ r . 

The number of ways in which the r given trials may be chosen is 

nC r . 

The probability of the happening of any one of these combina¬ 
tions of r successes and n — r failures is 

p r q n ~ r . 

When any one of these combinations occurs, it excludes the 
happening of any other of the combinations. So, by Theorem 
II, the probability of the happening of any one whatever of the 
set of „C r events is p r g" -r -f p T q n ~ r + ... to ,0V terms. 

This sum equals 

nC r p r r~ r , 

the total probability of the happening of the event exactly r 
times, no more no less. 

Theorem IV . If the probability that an event will occur in a 
single trial is p, and the probability that it will fail is q, then the 
probability that it will occur at least r times in n trials (r times or 
more) is the sum of the first (n — r + 1) terms in the expansion of 
(P + q) n ; that is, 

P n + nC&'-'g + „C*p n ”V + . . . + „C r p r g n_r . 


320 


A FIRST COURSE IN STATISTICAL METHOD 


Proof. —In order to happen r times or more, it may happen 
exactly n times, exactly n — 1 times, n — 2 times, and so on down 
to n — {n — r) = r times. Now, by Theorem I, the probability 
that the event will happen n times is 

p n . 

By Theorem III, the probability that it will happen exactly 
7i — 1 times is 

nCjp"- 1 ?, 

the probability that it will happen exactly n — 2 times is 

nC 2 p n -y, 

and so on until the probability that it will happen exactly r times 
is 

n CrV T q n ~ r • 

If the event happens n times in n trials, that excludes its hap¬ 
pening exactly n — 1 times, n — 2 times, or any other number of 
times. A similar statement is true for exactly n — 1 times, and 
for each of the above series, down to exactly r times. The prob¬ 
ability of the happening of any one whatever of this series of 
events is, by Theorem II, the sum of the separate probabilities of 
each event. This sum is 

p n 4- n Cip n ~ l q + „ Czp n ~ 2 q 2 + . . . + n,C r p T q n ~~ r i 
the probability sought. 

The application of these theorems to point binomial frequency 
distributions has been seen in the body of the text 



APPENDIX D 


DERIVATIVES AND INTEGRALS 


1. Limit of a Variable.—To understand the meaning of “deriv¬ 
ative,” a clear idea of “limit of a variable” should be obtained. 
A variable x is said to approach a constant L as its limit if, as x 
varies under some law, the value of the difference between x and L 
can be made to become and remain less than any assigriable number, 
no matter how small. The fact that L is the limit of x may be writ¬ 
ten x —>L. This symbol is read “x approaches L as a limit.” 

2. As an illustration of a variable and its limit, suppose a 
weight banging at rest on a coiled wire spring. If the weight is 
drawn down and let go, the length of the supporting spring changes, 
getting shorter, then longer, alternately, until the weight finally 
comes to rest again. While the weight oscillates up and down, 
the length of the spring is a variable, x, sometimes increasing, 
sometimes decreasing. The amplitude of oscillation diminishes 
according to a law governing springs in such a situation. After 
a time the weight comes to rest, and the spring is at its original 
length L. A situation has arisen such that the difference between 
x and L has become zero, and so is less than any assignable quan¬ 
tity, no matter how small. As the difference between x and L 
will now remain zero, it remains less than any assignable quantity, 
no matter how small. So by definition L is the limit of x. It 
is to be noted that at one point in each oscillation, either up or 
down, the value of x is equal to L, but the difference between x 

and L does not remain zero. This variable finally reaches its 
limit and ceases to vary. 

3. Another illustration of a variable approaching a limit is 
given by the sum of the terms of a geometrical progression having 
a proper fraction as the constant ratio, when the number of terms 
may be made as large as we please. Let S n be the sum of the first 
n terms of the series formed by starting with 1 and forming each 
term by taking half of the preceding term. Thus: 


S 

S 


4-i + H + >£ + H- 

e = 1 + y 2 + k + h + He + Yz2 

i 


Sn “ 1 +1 +1 + 


+ 


2*-i 


321 



322 A FIRST COURSE IN STATISTICAL METHOD 

As n changes, S n varies, n may be made as large as we please. 
The limit of S n , as n increases without limit, is 2. This is 
written S n = 2. 

Or, S n —> 2 as n —> <». That 2 is the limit may be seen in the 
following manner. Suppose a stick 2 ft. long. Cut it in two in 
the middle, making two pieces each 1 ft. long. Take half of one 
of these pieces and add it to the other. These tw r o pieces together 
make a length of 

1 + Yl = 1 A- 

Cut the remaining piece in two in the middle and add one of the 
pieces to the two just added. This makes a total length of 

1 + A + 34 — 1/4- 

Repeat the process with the remaining piece, adding half of it 
to the lengths already added, giving 

1 + K + 34 + 34 = 1 A- 

The sum always is less than 2 by the length of the piece left 
over. The lengths of the pieces left over are, successively 1, 34> 
l A\ A\ and so on. At the (n -f l)th division, the length of the 
piece left over is 34 n > and the pieces added together make 

1 + Vi + 34 + • • • + 34 n - 

Now, by making n large enough, 34 n may be made as small as we 
please, though never zero. Since the sum of the pieces added 
together differs from 2 by the length of the piece left over, this 
difference may be made as small as we please by making n large 
enough. Suppose, then, a small quantity is named. The differ¬ 
ence between 2 and the sum of the pieces added together may 
be made less than this small quantity simply by making n large 
enough. From that point on, as the process of adding half the 
piece left over is continued, the difference between 2 and the 
pieces added together will remain less than the small quantity that 
was named. This can be done no matter how small a quantity 
may be named. So the process enables us to make the sum o 
the pieces added together become and remain different from 
by less than any assignable number, no matter how small. 
Therefore, by definition, 2 is the limit of the sum of the terms 

1 d" A A T - • • • A 

as n is made to increase without limit. 

It is to be noted that there is always a piece left over and 
variable sum will never actually reach its limit of 2. n 



DERIVATIVES AND INTEGRALS 


323 


respect this variable differs from the variable length of the spring 
in the first illustration. 

4. The variable length of spring was alternately larger and 
smaller than the limit value. During the process of variation, 
the length became the same as the limit a number of times. The 
sum of the lengths of pieces of stick is always less than the limit. 
This variable never can reach the limit value. A variable might 
always be greater than its limit. It may reach the limit value or 
it may not, according to the nature of the variable. The impor¬ 
tant consideration is that the process of variation may be contin¬ 
ued till the difference between the variable and its limit becomes 
smaller than any specified quantity, no matter how small; and 
that as the process of variation continues, this difference remains 
less than the specified quantity, no matter how small that 
quantity may be. 

5. Tangent to a Curve.—In Fig. 67, consider the two points P 
and Q on a curve, and the secant line PQ through these points. 



Fig. 67.—Tangent to a curve. Derivative. 


Let the coordinates of P be x and y. In going from the point P 
to the point Q, the ^-coordinate changes by an amount PA = 
Ax. (Ax is called an increment of x. It is read “delta x” or 
increment of x.” A is a symbol indicating “increment of,” 
and is not a quantity multiplying x.) The y-coordinate changes 
by an amount AQ = Ay (increment of y). So the coordinates 
of Q are x + Ax and y + A y. The slope of the secant PQ is 

AQ _ Ay 
PA Ax 



324 


A FIRST COURSE IN STATISTICAL METHOD 


6. Now let Q move along the curve toward P, keeping the secant 
line through P and Q as Q moves. Ax and Ay grow smaller and 
smaller. The secant line turns on the point P. As it turns, 
its slope is always equal to Ay/Ax. Since the slope is changing, Ax 
and Ay do not change so as to maintain a constant ratio. 

7 . Let Q move along the curve toward P as a limiting position. 
This means that Q moves in such a manner that its distance, on the 
curve, from P can be made to become and remain less than any 
assigned quantity no matter how small. One is not concerned 
whether Q actually reaches P or not. As Q moves along the 
curve toward P as a limiting position, the secant PQ turns about 
P towards a limiting position. The straight line which has the 
limiting posit’on of the secant is called the tangent to the curve 
at the point P. As long as there is a limiting position for the secant, 
it makes no difference whether the secant ever reaches that posi¬ 
tion or not, as the line having the limiting position is the tangent. 

8. Slope of the Tangent.—In a smooth, continuous curve such 
as shown in Fig. 67, there must be one such limiting position atP 
and therefore, one tangent, such as PT. This tangent has a 
definite slope. Since, as the secant PQ turns on P, its varying 
slope always equals the varying ratio Ay/Ax, the limit value of 
Ay/Ax, as Q approaches P as a limiting position, will be the slope 
of the tangent at P. 

9. There are points on certain curves such that the limit value 
of Ay/Ax is not unique or may not exist. Such curves will not 
be taken up in this text. The further developments in this 
appendix are understood to apply to smooth continuous curves 

that have no such peculiar points. 

10 . Function.—Suppose the equation of the curve is given as 

y equals some function of x, written 

y = /(*)• 

This is a generalization which simply means that y equals some 

mathematical expression involving x, such as y = 3x - 1 , y =K X * 
4-2,?/ = 2 log x, y = l/x, etc. If the ^-coordinate of some point 

on the curve is given, the corresponding y-coordinate is, at once, 
found by putting the given value for x in the function and com¬ 
puting the resulting value of the function. For example, in y — 
i^x 2 + 2, if x = 2, y = K ( 2 ) 2 + 2 = 3. Thus the point (2, 3) 

is on the curve represented by y = H* 2 + 2. . 

Suppose, then, that on the curve y = f(x), a point P is taken 

whose coordinates are x and y. Now change x by an mcrement 



DERIVATIVES AND INTEGRALS 


325 


Ax, y will also change by a corresponding increment Ay, giving 
another point Q on the curve whose coordinates are x + Ax and 
y + Ay. The value of y + Ay is found by putting x + Ax for 
x in the function so that 

y + Ay = f(x + Ax). 

In the above example, 

y + Ay = V±{x + Ax) 1 + 2. 

The ratio Ay/Ax is the slope of the secant through P and Q. 

11 . Slope of the Tangent, Derivative.—To find the slope of the 
tangent at P, one must be able to find the limit value of Ay/Ax as 
Ax approaches zero, that is, as Q moves along the curve toward 
P as a limiting position. This is expressed thus: 

lim A y dy m 
Ax_>0 Ax dx 

It is read: “The limit of Ay/Ax as Ax approaches 0 as a limit 
is dy/dx.” This quantity, dy/dx, is called the derivative of y with 
respect to x. It is read “derivative of y with respect to x” or 
simply “dy over dx.” The process of finding the derivative is 
called differentiation. This limit value is, in general, a function 
of x, the abscissa of the point at which the tangent is drawn. So, 
to obtain the slope of the tangent at any point whose abscissa is 
known, substitute the abscissa of the point for x in dy/dx , the 
derived function. As with A in Ax, so here the d is not 
a quantity, but merely a symbol of the result of a process. The 
derivative, dy/dx, is of fundamental importance in differential 
calculus and its applications. 

12 . Derivative of x n .—In the calculus, rules are derived for 
writing down the derivatives of different types of functions. As 
an example of one method of attack for obtaining these rules, 
take the function 


y ~ x 2 . 

Change x by the increment Ax. The function becomes 

y + Ay = (x + Ax) 2 = x 2 + 2xAx -f (Ax) 2 
Subtracting 


leaves 



A y = 2xAx -f (Ax) 2 . 


Then dividing by Ax, 


A y 
Ax 


= 2x + Ax. 


326 


A FIRST COURSE IN STATISTICAL METHOD 


It follows: 


dy 


_ lim 


Ay 


= 2z, 


dx Ax —° Ax 

since as Ax approaches 0 as a limit, 2x -f Ax will approach 2x as a 
limit. 

Take the more general function 

y = x n . 

y + Ay = (x -f Ax) n . 

y -f Ay = x n + nx n ~ l Ax -T n ^ n [0 — x n ~ 2 (Ax) 2 -f terms in higher 

\i 

powers of Ax. 

Subtracting y = x n , leaves 

n{n — 1) 


Ay — nx n ~ l Ax + 


x n ~ 2 (Ax) 2 -f- terms in higher powers of 


Ax. 


Dividing by Ax, 


= nx n ~ l + x n ~ 2 Ax -f terms in higher powers of Ax. 

Since Ax may be made as small as we please, Ax times any finite 
quantity may be made as small as we please. When Ax becomes 
less than unity, as it approaches 0 as a limit, higher powers of Ax 
will approach 0 more rapidly than does Ax itself. Then the 
terms following nx n ~ l each approach 0 as a limit. If there is a 
finite number of these terms, or if their sum is absolutely conver¬ 
gent, the sum approaches 0 as a limit. Therefore the limit, 


dy _ 
dx 


= nx 


n—1 


This result translated into English reads: “The derivative of 
the nth power of x is n times x with the exponent diminished by 
one.” This gives a rule for differentiating powers of x which may 
be proved to hold when n is negative or fractional as well 
as integral. 

If 

y = ax n , 

each term of the above expansion of (x + Ax) n is multiplied by a 
and 

= anx n_1 . 
dx 



DERIVATIVES AND INTEGRALS 


327 


The rules for differentiation of other functions together with 
their derivation are given in works on differential calculus. 

The rule given above makes the derivative of a constant equal 
to 0. The constant a may be written ax° . Now applying the 
rule, the derivative is equal to a • 0 • x~ l , which equals 0. Or, 
if y = a, a constant, any change in x does not affect y and so A y 
would be 0 and Ay/Ax = 0. This remains 0 all the time as Ax 
approaches 0 as a limit. Hence dy/dx = 0 when y equals a 
constant. 

13. Application.—Take again the function 

y = + 2. 

Applying the rule for differentiation, 

dy 2 x 

- - = x = -■ 

dx 4 2 


y 



Since the slope of the tangent to the curve, which is the graph 
V = /OOj &t any point on the curve whose z-coordinate is 
given, is the value of dy/dx with the given value-of x substituted 

for x in the derivative, it is seen at once that the slope of the 
tangent to the curve 

y - y±x* + 2 

at the point whose x-coordinate is xi is Xi/2. 


328 


A FIRST COURSE IN STATISTICAL METHOD 


From the tabulation of values of y for varying values of x, the 
curve of Fig. 68 is plotted. 



At x = 2 for the point P, dy/dx = 2/2 = 1. Now thetangent 
may be drawn at P with a slope of 1. A line with slope 1 rises 
vertically the same amount that it proceeds horizontally to the 
right. 

14. Maximum.—Recall that the ^-coordinate known as ordi¬ 
nate, of a point on a curve determined by y = f(x) is the value of 



the function for that value of x which is the ^-coordinate, or 
abscissa, of that point. A function of x is said to have a maxi¬ 
mum value for that value x lf of x, which makes the function 
larger than for other values of x in the neighborhood of X\ either 

greater or less than X\. _ , 

In Fig. 69atx = £i = OB, the value of the function represented 

by the curve is y = BA. This is greater than the ordinate of a 
point on the curve in the neighborhood of A on either side of A. 
Then y = BA is a maximum for x = x x = OB. The function is 
said to be a maximum for x = OB. Similarly, at * - OF, there 
is another point E on the curve for which the same statement is 




DERIVATIVES AND INTEGRALS 


329 


true; y = FE is then a maximum value of the function. This 
curve has two maxima. This merely means that there are two 
points on the curve, each of which is higher than any points 
of the curve in its immediate neighborhood. It is said that 
x = OB makes /(x) a maximum and x = OF makes/(x) a maxi¬ 
mum. The maximum values of the function are, of course, 
found by substituting for x in the function the value of x that 

makes the function a maximum. 

16. Minimum.—In like manner, x = OD determines a point C 
on the curve such that ordinates in its immediate neighborhood, 
on both sides of C, are greater than DC, the ordinate at C. Then 
y = DC is called a minimum value of the function. The function 
is said to be a minimum for x = OD. The minimum value of 
the function is found by substituting OD for x in the function. 

Some functions of x have neither a maximum nor a minimum 
for any value of x. Some may have several maxima and several 
minima. 

16. With the sort of functions considered in this appendix, the 
tangent to the graph at maximum or minimum points is horizon¬ 
tal, or parallel to the x-axis. In other words, its slope is 0. At a 
maximum or a minimum point, then, there is a value of x which 
makes the derivative of the function zero. 

17. Increasing and Decreasing Functions.—At any point on 
the graph of y = f(x) where the curve is going up. to the right, 



Fig. 70.—Increasing and decreasing functions. 

the function is said to be increasing for the corresponding value of 
x, since it increases with increase of x. At any point where the 
curve is going down to the right, the function is said to be a 
decreasing function for the corresponding value of x. 


330 A FIRST COURSE IN STATISTICAL METHOD 

In Fig. 70, (a) and (6) represent increasing functions for all 
values of x shown, while (c) and (d) represent decreasing functions. 
Note that at any point on either (a) or (6) the tangent slopes up to 
the right, that is, has a positive slope. This means that if the 
x-coordinate of any point on such a curve be substituted for x 
in the derivative of the function, the result will be a positive 
quantity. Similarly, for a decreasing function at any value of 
x, the derivative is negative. 

18. Test for Maximum or Minimum.—If, on the curve y - 
fix) in Fig. 69 we pass tothe right approaching the high point A, the 
slope of the tangent is positive, decreasing to zero at A ; and then 
continuing to decrease, it becomes negative. Since the derivative 
of /(x) determines the slope of the tangent, it now follows that 
as the high point A is approached and passed through, the deriv¬ 
ative is a decreasing function, passing from positive to nega¬ 
tive through the value 0 at the high point A. It follows then 
that if the x-coordinate of A, a high point, be substituted for 
x in the derivative of the derivative, a negative result will be 
obtained. The derivative of the derivative of y = }{x) is called 
the second derivative of the function. A symbol for the second 
derivative of y with respect to x is dhy/dx 2 ; dy/dx is called the 

first derivative. 

A similar argument will show that if the x-coordinate of the low 
point C be substituted for x in d 2 y/dx z , the result will be positive, 
since at C the slope of the tangent, dy/dx, is an increasing func¬ 
tion, changing from negative to zero at C and then becoming 

positive. 

19. Rules.—Rules may now be stated for determining maxi¬ 
mum and minimum values of the kind of functions being discussed. 

Having given y = /Or), find dy/dx. Set dy/dx = 0 and solve 
for x. The resulting value or values of x determine the nig 
points and low points on the graph of y = /(*), since at those 
points the slope of the tangent is zero. If Xi is one such va ue o 
x, substitute x, for x in /(x). The result is the corresponding 
maximum or minimum value of the function as the case may e. 

To determine whether the resulting value is a 
minimum, find dhy/dx 2 . Substitute x x for x in dhy/dx . i 
the result is negative, a maximum value of/(x) was determ * 

If the result is positive, a minimum value of /Or) was; dete • 

If the result is zero, neither positive nor negative, further 

gation is necessary. 



DERIVATIVES AND INTEGRALS 


331 


20. Illustrative Example.—Take, for example, the function 

y = x 3 — 6x 2 + 9r + 2 
Differentiating gives 

dy 


dx 


= 3x 2 - 12x +9. 


Set 


Solving for x gives 


3x 2 - 12x + 9 = 0. 


x = 1 or 3. 


A second differentiation gives 

S = 01 - 12 - 

For x = 1, 


d 2 y __ 


dx 2 


= 6 - 1 - 12 , 


y=jc 3 - 6 x *+ 9 x +2 


is negative. Therefore, y has 
a maximum value at x = 1. 

This value is 

y = l 3 - 6-l 2 + 9-14-2 = 6. 

For x = 3, 

^ = 6 - 3-12 

dx 2 6 

is positive. Therefore, y has a 
minimum value at x = 3. This 
value is 

2/ = 3 3 -6-3 2 + 9-34-2 = 2. 

The graph of the curve is 
shown in Fig. 71. This shows 
the maximum value of 6 at x = 

1 and the minimum of 2 at x = 3. 

21. Speed and Acceleration.—The first derivative is used to 

etermine the rate of change of a function. If x represents time 

and y space passed over in time x, represent the variables by t 

an s. An equational relationship between space passed over in 
a given time takes the general form 

• = m. 



Fig. 71. 


332 


A FIRST COURSE IN STATISTICAL METHOD 


Then As is the space passed over in time At, and As/At is the 
average speed or velocity during time At. 


ds _ i im As 
dt A ^° At 


is defined as the instantaneous speed or velocity at time t. 
is, ds/dt is the time rate of change of s at any instant t. 
stands for velocity, then 



That 
If v 


Similarly, acceleration is the rate of change of velocity. 


acceleration is 



Then 


A body falling under the law 

s = 3 4gt 2 


has at any instant a velocity 



and an acceleration 




The constant g is called the gravitational constant. 

22. A Function of a Function.—If y is a function of z and 2 is a 
function of x, dy/dx may be found directly. If x be given an 
increment, Ax, z takes an increment Az. Then y takes an incre¬ 
ment Ay. Now 

Ay _ Ay Az 
Ax ~ Az Ax 

It can be shown that the limit of the product of two variables is 
the product of their limits. Therefore, from the above equation, 

dy _ dy dz 
dx dz dx 

As an illustrative example, let y = 2 3 and 2 = x 2 - 2x + 1- 


Then, 




DERIVATIVES AND INTEGRALS 


333 


and 

dy = dy dz 
dx dz dx 

= 3z 2 (2x - 2) 

= 3(x 2 - 2x + l) 2 (2x - 2). 

23. This rule for differentiating a function of a function could 
be applied directly to such a function as 

y = (x 2 - 2x + l) 3 . 

Let 

z = (x 2 - 2x + 1). 


and exactly the above example results. Write the following rule 
for differentiation of a function of x of the form y equals a power of 
a function of x. 

Multiply the function of x by the index of the power , diminish 
the index of the power by 1, and multiply the result by the deriva¬ 
tive of the function of x. 

As an example, take 

y = (x 3 - 3x 2 + 2x - l) 4 . 

^ = 4(x 5 - 3x 2 + 2* - l) 3 (3i 2 - 6z + 2). 
a£ 

24. A Function of Two Variables.—Sometimes y may be given 
as a function of two variables x and z. In such a case, to get a 
geometric interpretation, three coordinate planes may be 
used, mutually perpendicular. These planes intersect in three 
mutually perpendicular lints which are used as coordinate axes. 
Let Ox (Fig. 72) be the x-axis and Oz be the z-axis, both in the 
horizontal plane, and let Oy be the y-axis which is vertical. The 
coordinate planes are the horizontal plane, xOz, the vertical plane, 
xOy, and the vertical plane, zOy. Suppose the latter to be per¬ 
pendicular to the plane of the paper. 

26. Three Coordinates.—A point in space has three coor¬ 
dinates, x, y, and z, written (x, y, z). For example, the point 
(1, 2, 3) is one unit distant from the zOy-plane, two units distant 
from the xOz-plane, and three units distant from the xOy-plane. 

Let the positive direction for x be to the right, for y upward, 

and for z toward the front. The opposite directions are then 
negative. 


334 


A FIRST COURSE IN STATISTICAL METHOD 


26. Graph of f(x, z ).—An equation giving y as a function of x 
and 2 , 

y = fix, z ), 

represents a surface. As soon as values are assigned to x and 2 , y 
is determined. In Fig. 72, for x = OD = Xi and z = OH = 
Z\ the point E is determined in the a:02-plane. If y = f(x, z) is 
the equation of the surface ABC, then when OD = x\ is substi¬ 
tuted for x, and OH = z x for z in the function, the value y = EP 



Fig. 72.—Three-dimensional coordinates. 


= yi is at once known, determining the point P on the surface. 
This is expressed by 

2/i = fix i, 2i). 

27. Partial Derivative.—Every point in the plane FDG, passed 
perpendicular to Ox at D, has an ^-coordinate equal to OD = X\. 
This plane cuts the surface in the curve FPG. The equation of 
this curve in the plane FDG is 

y = fix i, 2 ), 

in which x x is a constant equal to OD. The slope of the tangent 
to this curve in the plane FDG at a point on the curve is the deriv¬ 
ative of y with respect to z, letting x remain constant and equal 









DERIVATIVES AND INTEGRALS 


335 


to Xi. Such a derivative is called partial derivative of y with 
respect to 2 . Its symbol is by/bz. Similarly, the slope of the 
tangent to the curve KPL , in the plane KIIL , at a point on the 
curve is by/bx, letting z remain constant and equal to Oil = Z\. 

28. Conditions Necessary for Maximum or Minimum—Now, 
in by/bz, substitute for 2 the 2 -coordinate of the point P, and the 
result gives the slope of the tangent to the curve PPG at P. Let 
this tangent be PM. Similarly, in by/bx, substitute the x-co- 
ordinate of P for x , and the result gives the slope of the tangent to 
the curve KPL at P. Let this tangent be PN. 

If EP = y\ were a maximum value of the function/(x, y), it is 
evident that it would have to be a maximum for the curve y = 
f(xi, 2 ) and the curve y = /(x, 21 ). This means that the tangents 
PM and PN would both be horizontal. In other words, necessary 
conditions for a maximum would be 

ty 

bx 
bz 


= 0 , 


= 0 


Evidently these conditions must also hold for a minimum. 

The partial derivative by/bx may be again differentiated with 
respect to x leaving 2 constant, or with respect to 2 leaving x 
constant. If the differentiations are both with respect to x leaving 
2 constant, the symbol is b 2 y/bx 2 . If the first differentiation is 
with respect to x and the second one with respect to 2 , the symbol 
is b 2 y/bx bz. 

For either a maximum or a minimum, it can be shown that 

( b 2 y/bxbz ) 2 must be less than | v 

6x^ bz 1 

So the conditions for either a maximum or a minimum are 

r = °> 

bx 


by 

bz 


= 0, 


( b h.y < *y 

\bxbz) bx 2 bz 2 ' 

In addition to these conditions, add that for a maximum 

3 <»*»o 3 < »■ 



336 


A FIRST COURSE IN STATISTICAL METHOD 


and for a minimum 

at the point in question. These conditions may be taken 
without further proof. 

29. Illustrative Example.—As an example, test 

y = 3x2 — x 3 — 2 3 
for maxima and minima. 

^ = 32 - 3z 5 
Sx 

^ = 3x- 3z ! . 

52 

Equating each to zero, gives 

2 — x 2 = 0, 
x — 2 2 = 0. 


Solving for x and z gives 


Now, 


x = 0 or 1, 
2 = 0 or 1. 


= _6 x 

5x 2 

*i= -62 

hz 2 


t> 2 y _ 


bxhz 


= 3. 


For x = 0 and 2 = 0, 


= o = o = 3 

6x 2 °* 62 2 U ’ 5x52 

( i 2 *) 2 = 9 
\5x52/ 

is not less than 

*v . = o 

5x 2 52 2 

Therefore, there is neither a maximum nor minimum at x 
z = 0. 

For x = 1 and 2 = 1, 

= 3. 


= 0, 


*!» = _G = -6 

Sz 2 ’ 52 2 * 5x52 

( i2 l-Y = 9 4 ‘" • = 36. 

\5x52/ ’ 5x 2 52 2 



DERIVATIVES AM) INTEGRALS 


337 


Thus the conditions for a maximum or a minimum are fulfilled 
at x = 1, z = 1. Both hhy/bx 2 and b*y/6z* at this point are 
negative. Therefore, there is a maximum value of y at x = 1, 
2 = 1. This maximum value is obtained by substituting 1 for x 
and 1 for z in the original equation. This gives 

y = 3-1*1 — l 3 — 1 3 = 1 

as the maximum value. This means that if the surface repre¬ 
sented by the original equation were constructed, the point on 
the surface at x = 1, z = 1 would be higher than any points in its 
immediate neighborhood. It would be at a height of 1. 

30. Application to Least Squares Adjustment.—In Chap. VIII, 
Sec. 8, the method of finding the straight line of closest fit by 
least squares adjustment was found to depend on finding the 
values of a and b such that 

2(ax -f b - yY 

should be a minimum. Now find the required values of a and b 
by the above conditions and tests. 

Let 

u = 2 (ax -f b — y) 2 . 

The unknown quantities to be determined are the values of a 
and b to make u a minimum. 

= 22(ax + b - y), 

^ = 22 (ax + b - y)x. 

The equations to solve for a and b are then 

22(ax -f b — y) = 0 
and 

22(ax -f- b — y)x — 0. 

These reduce to 

a2x + Nb - 2y = 0, 
a2x 2 + blx — 2 xy = 0. 

Solving for a and b gives 

_ 2x • 2?/ — A 7 2 xy 
“ ~ (2x) s - 

, __ 2x • 2 xy — 2x 2 • 2 y 
(2x) 2 - N 2X 2 


338 


A FIRST COURSE IN STATISTICAL METHOD 


Now 


The condition 


gives 


8 2 u 

8b8a 


2lx. 



8 2 u 

8a 2 


2 lx 2 . 


( 8 2 u\ 2 8 2 u 8 2 u f 

\8b8a) < 8b 2 ' 8a 2 

C 2lx) 2 < 2 N • 22a; 2 . 


It is easily shown that if x is increasing by constant differences, 

(22a;) 2 < 2N • 22:r 2 . 


Moreover, 8 2 u/8b 2 = 2iVand 8 2 u/8a 2 = 22a; 2 are each positive, 
regardless of the values of a and b. It thus appears that the 
conditions are all fulfilled for giving a minimum value to u at the 
values of a and b obtained above. 

This completes a brief consideration of two important applica¬ 
tions of differentiation, namely: (1) determination of maximum 
and minimum values, and (2) rates of change. 

31. Integration, Constant of Integration.—Integration is the 
inverse process of differentiation, just as subtraction is the inverse 
of addition, division the inverse of multiplication. In other 
words, to integrate y = /(x), find a function which differentiated 
will give f(x). 

For example, the integral of y = x 2 is a; 3 /3, since x 3 /S differ¬ 
entiated gives Sx 2 /S = x 2 . But z 3 /3 + 2, x 3 /S — 6, and x 3 /S 
plus any constant C also differentiate into x 2 . For this reason, 
when integrating a function, add a constant C to the result. This 
is called the constant of integration. Unless there are conditions 
which determine this constant, it may be any constant whatever. 

The symbol of integration is an elongated s. The integral of 
(z) is written 

/ f(x)dx. 




339 


DERIVATIVES AND INTEGRALS 

This is read “the integral of x square dx equals x 3 /3 plus a 

constant.” 

In general, 

/ T n-fl 

x n dx = — J—T 4 - C. 

n -hi 

This holds except for n = —1. 

32. Just as a constant factor in a function to be differentiated 
appears in the derivative, so a constant factor in a function to be 
integrated appears in the resulting integral. Since the constant 
of integration may be any constant, the constant factor may be 
regarded as a factor of it also. It follows that a constant factor 
may be placed either side of the integral sign without changing 
the result. 

33. Geometric Meaning.—To get a geometric interpretation 
of the meaning of integration, suppose y = /(x) to be the equation 



Fig. 73.—Geometric meaning of integration. 


of the curve in Fig. 73. If the point P starts at P 0 , where X 
= OX o = a, and moves according to the law y = /(x) till x = 
OXi = b, P may be said to generate the curve from P 0 to Pi. At 
the same time, the ordinate of P, XP, sweeps over or generates 
the area P 0 XqXiPi under the curve. 

In taking any position of the generating point such as P, the 
ordinate XP at that point is y = /(x), where x = OX. The 
area generated up to XP is P q XqXP. Now give x an increment 
Ax = XR. The area A takes an increment AA. This increment 
of area is P XRQ between the curve, the x-axis, and the ordinates 


340 


A FIRST COURSE IN STATISTICAL METHOD 


XP and RQ. The area A A as shown is greater than the rectangle 
PR and less than the rectangle QX. Always 

AA > Ax times the least ordinate to arc PQ 
and 


AA < Ax times the greatest ordinate to arc PQ 
So that 


AA 

Ax 


> the least ordinate to arc PQ, 


and 


AA 

Ax 


< the greatest ordinate to arc PQ. 


In the figure, the least ordinate is XP and the greatest is RQ. 
In any case as Ax —»0, the difference between the least ordinate 
and y = XP as well as the difference between the greatest ordi¬ 
nate and y — XP may be made to become and remain less than 
any assignable quantity, no matter how small. In other words, 
if Ax —> 0, the limit of each ordinate in question is y = XP. 
Since, as Ax —> 0, the value of A A/Ax remains between the values 
of the greatest and least ordinates to arc PQ, the limit of AA/Ax 
must be y = XP, the common limit of the two ordinates. 

Therefore, 



or 



Hence, by integration, area 

A = fydx = ff{x)dx. 


Let 


ff(x)dx = F{x) + C. 


Then 

A = F{x) + C. 

The value of the constant of integration, C, is determined by 
the initial conditions at x = OX 0 = a. When x = a, the point 
generating the curve is at Po and the ordinate XoPo has swept 
over no area at all. In other words, when x = a, A =0. 

Therefore, 

0 = F(a) + C, 


and 


C = - F(a). 



DERIVATIVES AND INTEGRALS 


341 


Substituting this value of C gives 

A = F(x) - F(a) 

as the area generated by the ordinate when P moves from P 0 
to any point P. 

Let OXi = b. Then the area P Q X Q XiPi is 

A = F(b ) - F(a). 

34. Definite Integral.—Starting, then, with the curve y = f(x), 
the area between the curve, the x-axis, and the ordinates at x = 
a and x = b is expressed as 

A = f'mdx = F(b) - F(a), 

in which F{x ) is the result of the integration J'f(x)dx. 



J a f(x)dx is called the definite integral of f(x) between the limits 


a and b. 

It can be shown 1 that if the line XoX n be divided into segments 
of length Ax, and ordinates drawn at the points of division, and 
rectangles be completed by drawing horizontal lines from the 
top of each ordinate to the next ordinate as shown in Fig. 74, 
then as Ax —> 0 the sum of the areas of the rectangles approaches 
as a limit the area between the curve, the x-axis, and the ordi¬ 
nates at X 0 and X n . It follows that the limit of the sum of the 
areas of the rectangles, as the width of each approaches zero as a 
limit and their number increases without limit, is equal to the 

1 See works on Integral Calculus. 



342 


A FIRST COURSE IN STATISTICAL METHOD 


definite integral fj{x)dx, where a = OX 0) b = 0X n , and y = 

f(x) is the equation of the curve. 

In this manner, the definite integral is regarded as a limit of 
a sum and is useful in many problems of summation. 

35. Illustrative Example.—As an example in integration, find 
the area under the curve in Fig. 71 from x = 1 to x = 3. The 
equation of the curve is 


The area is 


-[?- 

-K- 


y = x 3 — 6x 2 + 9a; + 2. 


A = J 3 (x 3 - 6® 2 + 9x + 2 )dx 


fire 3 Ox 2 
~3 ' ~2 


6-3 

3 


3 9 • 3 2 


l 3 

+ 2z + Cj 

i E + ..3 , + e]-B-! + | + , t c] 


= 8 . 


The limits of integration, 1 and 3, written with the bracket 
indicate that 1 is to be substituted for x within the bracket and 
the result subtracted from the result of substituting 3 for x 
within the bracket. Notice that C, the constant of integration, 
will always cancel out in the subtraction. For this reason C 
may be neglected in connection with the definite integral. 


MA THEM A T1CAL TA BLES 


343 


SQUARES OP NUMBERS* 


N 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

<3 

1.00 

1.000 

1.002 

1.001 

1 006 

1.008 

1.010 

1.012 

1.014 

1.016 

1 018 

2 

1 

1 020 

1.022 

1.024 

1.026 

1 028 

1.030 

i .032 

1.034 

1 036 

1 038 


2 

1.040 

1.042 

1.044 

1.047 

1.049 

1.051 

1.053 

1.055 

1 057 

1 059 


3 

1 061 

1.063 

1.065 

1.067 

1.069 

1.071 

1.073 

1.075 

1 077 

1 080 


4 

1.082 

1.084 

1.086 

1.088 

1.090 

1.092 

1.094 

1.096 

1.098 

1.100 


105 

1.102 

1.105 

1.107 

1.109 

1.111 

1.113 

1.115 

1.117 

1.119 

1.121 


6 

1.124 

1.126 

1.128 

1.130 

1.132 

1.134 

1.136 

1.138 

1.141 

1 143 


7 

1.145 

1.147 

1.149 

1.151 

1.153 

1.156 

1.158 

1.160 

1.162 

1.164 


8 

1.166 

1.169 

1.171 

1.173 

1.175 

1.177 

1.179 

1.182 

1 184 

1.186 


9 

1.188 

1.190 

1.192 

1.195 

1.197 

1.199 

1.201 

1.203 

1.206 

1.208 


1.10 

1 210 

1.212 

1.214 

1.217 

1.219 

1.221 

1.223 

1.225 

1.228 

1.230 


1 

1.232 

1.234 

1.237 

1.239 

1.241 

1.243 

1.245 

1 248 

1.250 

1.252 


2 

1.254 

1.257 

1.259 

1.261 

1.263 

1.266 

1.268 

1 270 

1.272 

1.275 


3 

1.277 

1.279 

1.281 

1.284 

1.286 

1.288 

1.290 

1.293 

1.295 

1 297 


4 

1.300 

1.302 

1.304 

1.306 

1.309 

1.311 

1.313 

1.316 

1.318 

1.320 


1.16 

1.322 

1.325 

1.327 

1.329 

1.332 

1.334 

1.336 

1.339 

1.341 

1.343 


6 

1.346 

1.348 

1.350 

1353 

1.355 

1.357 

1.360 

1.362 

1.364 

1.367 


7 

1.369 

1.371 

1.374 

1.376 

1.378 

1.381 

1.383 

1.385 

1.388 

1.390 


8 

1.392 

1.395 

1.397 

1.399 

1.402 

1.404 

1.407 

1.409 

1 411 

1.414 


9 

1.416 

1.418 

1.421 

1.423 

1.426 

1.428 

1.430 

1.433 

1.435 

1.438 


1.90 

1.440 

1.442 

1.445 

1.447 

1.450 

1.452 

1.454 

1.457 

1.459 

1.462 


1 

1.464 

1.467 

1.469 

1.471 

1.474 

1 476 

1.479 

1.481 

1.484 

1.486 


2 

1.488 

1.491 

1.493 

1.496 

1.498 

1.501 

1.503 

1 506 

1.508 

1.510 


3 

1.513 

1.515 

1.518 

1.520 

1.523 

1.525 

1.528 

1.530 

1.533 

1.535 


4 

1.538 

1.540 

1.543 

1.545 

1.548 

1.550 

1.553 

1.555 

1.558 

1.560 


1.26 1 

1.562 

1.565 

1.568 

1.570 

1.573 

1.575 

1.578 

1.580 

1.583 

1 585 

3 

6 1 

1.588 

1.590 

1.593 

1.595 

1.593 

1.600 

1.603 

1.605 

1.608 

1.610 


7 

1 613 

1.615 

1.618 

1.621 

1.623 

1.626 

1.628 

1.631 

1.633 

1 636 


8 

1.638 

1.641 

1.644 

1.646 

1.649 

1.651 

1.654 

1 656 

1 659 

1.662 


9 

1.664 

1.667 

1.669 

1.672 

1.674 

1.677 

1.680 

1.682 

1.685 

1.687 


1.30 

1.690 

1.693 

1.695 

1.693 

1.700 

1.703 

1.706 

1.708 

1.711 

1.713 


1 

1.716 

1.719 

1.721 

1.724 

1.727 

1.729 

1.732 

1.734 

1.737 

1 740 


2 

1.742 

1.745 

1.748 

1.750 

1.753 

1.756 

1.758 

1.761 

1.764 

1.766 


3 

1.769 

1.772 

1.774 

1.777 

1.780 

1.782 

1.785 

1.788 

1 790 

1 793 


4 

1.796 

1.798 

1.801 

1.804 

1.806 

1.809 

1.812 

1.814 

1.817 

1.820 


1.35 

1.822 

1.825 

1.828 

1.83! 

1.833 

1.836 

1.839 

1.841 

1.844 

1.847 


6 

1.850 

1.852 

1.855 

1.858 

1.860 

1.863 

1.866 

1.869 

1.871 

1 874 


7 

1.877 

1.880 

1.882 

1.885 

1 888 

1.891 

1.893 

1.896 

1.899 

1.902 


8 

1.904 

1.907 

1.910 

1.913 

1.915 

1.918 

1.921 

1.924 

1.927 

1.929 


9 

1.932 

1.935 

1.938 

1.940 

1.943 

1.946 

1.949 

1.952 

1.954 

1.957 


1.40 

1.960 

1.963 

1.966 

1.968 

1.971 

1.974 

1.977 

1.980 

1.982 

1.985 


1 

1.988 

1.991 

1.994 

1.997 

1.999 

2.002 

2005 

2008 

2011 

2.014 


2 

1016 

1019 

2.022 

2.025 

2.028 

2031 

2033 

2.036 

2.039 

2042 


3 

1045 

1048 

1051 

1053 

2.056 

2.059 

2.062 

2065 

2068 

2.071 


4 

1074 

1076 

1079 

1082 

1085 

1088 

2091 

2094 

1097 

1100 


145 

1102 

1105 

1108 

1111 

1114 

2117 

2120 

2.123 

2126 

2129 


6 

1132 

1135 

1137 

1140 

1143 

2146 

2149 

2152 

2155 

2158 


7 

2.161 

2.164 

2.167 

2.170 

2.173 

2.176 

2.179 

2.182 

2.184 

2.187 


8 

1190 

1193 

1196 

1199 

2.202 

2205 

2208 

2211 

2214 

2217 


9 

1220 

1223 

1226 

1229 

1232 

2235 

2238 

2241 

2244 

1247 



oMjfn* the decimal point ONE place in If requires moving it TWO places in body 


* Tables on pages 343 to 352 inclusive are reprinted from Marks' 
Mechanical Engineers' Handbook. 





344 


A FIRST COURSE IN STATISTICAL METHOD 


SQUARES ( continued) 


1.60 I 

2250 

2.253 

2.256 

2.259 

2.262 

2.265 

1 

2280 

2.283 

2.286 

2 289 

2.292 

2.295 

2 

2310 

2.313 

2.316 

2.320 

2323 

2.326 

3 

2341 

2.344 

2.347 

2.350 

2353 

2.356 

4 

2372 

2375 

2.378 

2381 

2384 

2387 

1.65 

2.402 

2406 

2.409 

2.412 

2.415 

2.418 

6 

2434 

2437 

2.440 

2.443 

2.446 

2.449 

7 

2465 

2.468 

2.471 

2474 

2.477 

2481 

8 

2496 

2.500 

2.503 

2.506 

2.509 

2.512 

9 

2528 

2531 

2534 

2.538 

2541 

2.544 

1.60 

2560 

2563 

2.566 

2.570 

2573 

2.576 

1 

2592 

2595 

2.599 

2.602 

2.605 

2.608 

2 

2624 

2.628 

2.631 

2.634 

2.637 

2.641 

3 

2.657 

2.660 

2.663 

2.667 

2.670 

2.673 

4 

2.690 

2693 

2696 

2699 

2703 

2.706 

1.65 

2.722 

2726 

2.729 

2.732 

2.736 

2.739 

6 

2.756 

2.759 

2 762 

2.766 

2.769 

2 772 

7 

2.789 

2.792 

2.796 

2.799 

2.802 

2.806 

8 

2.822 

2.826 

2.829 

2.832 

2 836 

2.839 

9 

2856 

2.859 

2.863 

2.866 

2.870 

2.873 

1.70 

2.890 

2893 

2.897 

2.900 

2.904 

2.907 

1 

2 924 

2928 

2.931 

2934 

2.938 

2.941 

2 

2958 

2.962 

2.965 

2.969 

2.972 

2.976 

3 

2.993 

2.996 

3.000 

3.003 

3.007 

3.010 

4 

3.028 

3.031 

3.035 

3.038 

3.042 

3.045 

1.75 

3.062 

3.066 

3.070 

3.073 

3.077 

3.080 

6 

3.098 

3.101 

3.105 

3.108 

3.112 

3.115 

7 

3.133 

3.136 

3.140 

3.144 

3.147 

3.151 

8 

3.168 

3.172 

3.176 

3.179 

3.183 

3.186 

9 

3.204 

3.208 

3.211 

3.215 

3.218 

3.222 

1.80 

3.240 

3.244 

3.247 

3251 

3.254 

3.258 

1 

3.276 

3.280 

3.283 

3.287 

3.291 

3.294 

2 

3.312 

3.316 

3.320 

3.323 

3.327 

3.331 

3 

3.349 

3.353 

3.356 

3.360 

3.364 

3.367 

4 

3.386 

3.389 

3.393 

3.397 

3.400 

3.404 

1.85 

3.422 

3.426 

3.430 

3.434 

3.437 

3.441 

6 

3.460 

3.463 

3.467 

3.471 

3.474 

3.478 

7 

3.497 

3.501 

3.504 

3.508 

3.512 

3.516 

8 

3.534 

3.538 

3.542 

3.546 

3.549 

3.553 

9 

3.572 

3.576 

3.580 

3.583 

3.587 

3.591 

1.90 

3.610 

3.614 

3.618 

3.621 

3.625 

3.629 

1 

3.648 

3.652 

3.656 

3.660 

3.663 

3.667 

2 

3.686 

3.690 

3.694 

3.698 

3.702 

3.706 

3 

3.725 

3.729 

3.733 

3.736 

3.740 

3.744 

4 

3.764 

3.767 

3.771 

3.775 

3.779 

3.783 

1.95 

3.802 

3.806 

3.810 

3.814 

3.818 

3.822 

6 

3 842 

3.846 

3.849 

3.853 

3.857 

3.861 

7 

3.881 

3.885 

3.889 

3.893 

3.897 

3.901 

8 

3.920 

3.924 

3.928 

3.932 

3.936 

3.940 

9 

3.960 

3.964 

3.968 

3.972 

3.976 

3.980 


2268 

2.298 
2329 
2359 
2390 

2421 

2.452 

2484 

2.515 

2547 

2579 

2611 

2644 

2.676 

2.709 

2742 

2776 

2.809 

2.843 

2876 

2910 

2945 

2979 

3.014 

3.049 

3.084 

3.119 

3.154 

3.190 

3.226 

3.262 

3.298 
3.334 
3.371 
3.408 

3.445 

3.482 

3.519 

3.557 

3.595 

3.633 

3.671 

3.709 
3.748 
3.787 

3.826 

3.865 

3.905 

3.944 

3.984 


2271 

2.301 
2332 
2.362 
2393 

2.424 

2.455 

2487 

2519 

2.550 

2582 

2.615 

2.647 

2.680 

2713 

2.746 

2.779 

2.812 

2.846 

2880 

2.914 

2.948 
2.983 
3.017 
3.052 

3.087 

3.122 

3.158 

3.193 

3.229 

3.265 

3.301 
3.338 
3.375 
3.411 

3.448 

3.486 

3.523 

3.561 

3.599 

3.637 

3.675 

3.713 

3.752 

3.791 

3830 

3.869 

3.909 

3.948 
3.988 


2274 

2.304 
2335 
2.365 
2.3% 

2427 

2.459 

2.490 

2.522 

2554 

2.586 
2.618 
2 650 
2683 
2716 

2.749 
2.782 
2.816 
2 849 
2883 

2.917 

2.952 
2.986 
3.021 
3.056 

3.091 

3.126 

3.161 

3.197 

3.233 

3.269 

3.305 
3.342 
3.378 
3.415 

3.452 

3.489 

3.527 

3.565 

3.602 

3.640 

3.679 

3.717 

3.756 

3.795 

3.834 

3.873 

3.912 

3.952 
3.992 


2.277 

2.307 

2.338 

2369 

2.399 

2430 

2462 

2.493 
2.525 
2557 

2589 

2621 

2.654 

2686 

2719 

2.752 

2.786 

2819 

2.853 

2887 

2.921 

2955 

2.989 

3.024 

3.059 

3.094 

3.129 

3.165 

3201 

3.236 

3.272 

3.309 

3.345 

3.382 

3.419 

3.456 

3.493 
3.531 
3.568 
3.606 

3.644 

3.683 

3.721 

3.760 

3.799 

3.838 

3.877 

3.916 

3.956 

3.996 






MA THEM A TIC A L T A BLES 


345 


SQUARES (continued) 


N 

0 

i 

a 

3 

4 

6 

6 

7 

6 

9 

2.00 

4000 

4.004 

4.008 

4.012 

4016 

4.020 

4.024 

4.028 

4.032 

4.036 

| 

4.040 

4.044 

4 048 

4.052 

4.056 

4.060 

4.064 

4.068 

4.072 

4.076 

2 

4.080 

4.084 

4.088 

4.093 

4.097 

4.101 

4.105 

4.109 

4.113 

4.117 

3 

4.121 

4.125 

4.129 

4.133 

4.137 

4.141 

4.145 

4.149 

4.153 

4.158 

4 

4.162 

4.166 

4.170 

4.174 

4.178 

4.182 

4.186 

4.190 

4.194 

4.198 

2.05 

4.202 

4.207 

4 . 21 ! 

4.215 

4.219 

4.223 

4.227 

4 . 23 ! 

4.235 

4.239 

6 

4.244 

4.248 

4.252 

4.256 

4.260 

4.264 

4.268 

4.272 

4.277 

4.281 

7 

4.285 

4.289 

4.293 

4.297 

4.301 

4.306 

4.310 

4.314 

4.318 

4.322 

8 

4.326 

4.331 

4.335 

4.339 

4.343 

4.347 

4.351 

4.356 

4.360 

4.364 

9 

4.368 

4.372 

4.376 

4.381 

4.385 

4.389 

4.393 

4.397 

4.402 

4.406 

2.10 

4.410 

4.414 

4.418 

4.423 

4.427 

4.431 

4.435 

4.439 

4.444 

4.448 

1 

4.452 

4.456 

4.461 

4.465 

4.469 

4.473 

4.477 

4.482 

4 486 

4.490 

2 

4.494 

4.499 

4.503 

4.507 

4.511 

4.516 

4.520 

4.524 

4.528 

4 533 

3 

4.537 

4.541 

4.545 

4.550 

4.554 

4.558 

4.562 

4.567 

4.571 

4.575 

4 

4.580 

4.584 

4.588 

4.592 

4.597 

4.601 

4.605 

4.610 

4.614 

4.618 

2.18 

4.622 

4.627 

4.631 

4.635 

4.640 

4.644 

4.648 

4.653 

4.657 

4.661 

6 

4.666 

4.670 

4.674 

4.679 

4.683 

4.687 

4.692 

4.696 

4.700 

4.705 

7 

4.709 

4.713 

4.718 

4.722 

4.726 

4.731 

4.735 

4.739 

4.744 

4.748 

8 

4.752 

4.757 

4.761 

4.765 

4.770 

4.774 

4.779 

4.783 

4.787 

4.792 

9 

4.796 

4.800 

4.805 

4.809 

4.814 

4.818 

4.822 

4.827 

4.831 

4.836 

2.20 

4.840 

4.844 

4.849 

4.853 

4.858 

4.862 

4.866 

4 . 87 ! 

4.875 

4.880 

1 

4.884 

4.889 

4.693 

4.897 

4.902 

4.906 

4.911 

4.915 

4.920 

4.924 

2 

4.928 

4.933 

4.937 

4.942 

4.946 

4.951 

4.955 

4.960 

4.964 

4.968 

3 

4.973 

4.977 

4.982 

4.986 

4.991 

4.995 

5.000 

5.004 

5.009 

5.013 

4 

5.018 

5.022 

5.027 

5.031 

5.036 

5.040 

5.045 

5.049 

5.054 

5.058 

2 . 2 S 

5.062 

5.067 

5.072 

5.076 

5.081 

5.085 

5.090 

5.094 

5.099 

5.103 

6 

5.108 

5.112 

5.117 

5.121 

5.126 

5.130 

5.135 

5.139 

5.144 

5 148 

7 

5.153 

5.157 

5.162 

5.167 

5.171 

5.176 

5.180 

5.185 

5.189 

5.194 

8 

5.198 

5.203 

5.208 

5.212 

5.217 

5.221 

5.226 

5.230 

5 235 

5.240 

9 

5.244 

5.249 

5.253 

5.258 

5.262 

5.267 

5.272 

5.276 

5.281 

5.285 

2.30 

5.290 

5.295 

5.299 

5.304 

5.308 

5.313 

5.318 

5.322 

5.327 

5.331 

1 

5.336 

5.341 

5.345 

5.350 

5.355 

5.359 

5.364 

5.368 

5.373 

5.378 

2 

5.382 

5.387 

5.392 

5.396 

5.401 

5.406 

5.410 

5.415 

5.420 

5.424 

3 

5.429 

5.434 

5.438 

5.443 

5.448 

5.452 

5.457 

5.462 

5.466 

5.471 

4 

5.476 

5.480 

5.485 

5.490 

5.494 

5.499 

5.504 

5.508 

5.513 

5.518 

2.35 

5.522 

5.527 

5.532 

5.537 

5.541 

5.546 

5.551 

5.555 

5.560 

5.565 

6 

5.570 

5.574 

5.579 

5.584 

5.588 

5.593 

5.598 

5.603 

5.607 

5.612 

7 

5.617 

5.622 

5.626 

5 631 

5.636 

5.641 

5.645 

5.650 

5.655 

5.660 

8 

5.664 

5.669 

5.674 

5.679 

5.683 

5.688 

5.693 

5.698 

5.703 

5.707 

9 

5.712 

5.717 

5.722 

5.726 

5.731 

5.736 

5.741 

5.746 

5.750 

5.755 

2.40 

5.760 

5.765 

5.770 

5.774 

5.779 

5.784 

5.789 

5.794 

5.798 

5.803 

1 

5.808 

5.813 

5.818 

5.823 

5.827 

5.832 

5.837 

5.842 

5.847 

5.852 

2 

5.856 

5.861 

5.866 

5.871 

5.876 

5.881 

5.885 

5.890 

5.695 

5.900 

3 

5.905 

5.910 

5.915 

5.919 

5.924 

5.929 

5.934 

5.939 

5.944 

5.949 

4 

5.954 

5.958 

5.963 

5.968 

5.973 

5.978 

5.983 

5.988 

5.993 

5.998 

2.45 

6.002 

6.007 

6.012 

6.017 

6.022 

6.027 

6.032 

6.037 

6.042 

6 047 

6 

6.052 

6.057 

6.061 

6.066 

6.071 

6.076 

6.081 

6.086 

6.091 

6.096 

7 

6.101 

6.106 

6.111 

6.116 

6.121 

6.126 

6.131 

6.136 

6.140 

6 . 14 S 

8 

a 

6.150 

4 A A A 

6.155 

6.160 

6.165 

6.170 

6.175 

6.180 

6.185 

6.190 

6.195 

9 

6.200 

6.205 

6.210 

6.215 

6.220 

6.225 

6.230 

6.235 

6.240 

6.245 


Morins the decimal point ONE place in N requires moving it TWO places in body 
oi table. 


Avg. 



346 


A FIRST COURSE IN STATISTICAL METHOD 


SQUARES ( continued) 







M A THEM A TIC Ah TA BLES 


347 


SQUARES ( continued) 


N 

0 

1 

2 

3 

4 

5 

6 

7 

6 

9 

• • 

£ 


9 000 

9006 

9.012 

9.018 

9 024 

9 . 0:0 

9.036 

9042 

9.048 

9.054 

6 

1 

9.0b0 

9.066 

9.072 

9.078 

9.084 

9.0.0 

9.096 

9.102 

9.108 

9.114 


2 

9.120 

9.126 

9.132 

9.139 

9.145 

9.151 

9.157 

9.163 

9.169 

9.175 


3 

9.181 

9.187 

9.193 

9.199 

9.205 

9.211 

9.217 

9.223 

9.229 

9.236 


4 

9.242 

9.248 

9.254 

9.260 

9.266 

9.272 

9278 

9.284 

9.290 

9396 


3.06 

9.302 

9.309 

9.315 

9.321 

9.327 

9.333 

9.339 

9.345 

9.351 

9357 


6 

9.364 

9.370 

9.376 

9.382 

9.388 

9394 

9.400 

9.406 

9.413 

9.419 


7 

9.425 

9.431 

9.437 

9.443 

9.449 

9.456 

9.462 

9.468 

9.474 

9.480 


8 

9.486 

9.493 

9.499 

9.505 

9.511 

9.517 

9 52) 

9330 

9336 

9342 


9 

9.548 

9354 

9360 

9.567 

9.573 

9.579 

9.585 

9391 

9398 

9.604 


3.10 

9.610 

9.616 

9.622 

9.629 

9.635 

9.64! 

9.647 

9,653 

9.660 

9.666 


1 

9.672 

9.678 

9.685 

9.691 

9.697 

9.703 

9.709 

9716 

9.722 

9.728 


2 

9.734 

9.74! 

9.747 

9.753 

9.759 

9 766 

9.772 

9.778 

9.784 

9.791 


3 

9.797 

9.803 

9.809 

9.816 

9.822 

9.828 

9.834 

9 841 

9.847 

9.853 


4 

9.860 

9.866 

9.872 

9.878 

9.885 

9.891 

9.897 

9.904 

9.910 

9.916 


3.16 

9.922 

9.929 

9.935 

9941 

9.948 

9.954 

9.960 

9.967 

9.973 

9.979 


6 

9.986 

9.992 

9.998 

10.005 







6 

3.1 







9.99 

10.05 

10.11 

1018 

6 

2 

10.24 

10.30 

10.37 

10.43 

10.50 

10.56 

10.6) 

10.69 

10.76 

10.82 


3 

10.89 

10.96 

11.02 

II 09 

11.16 

11.22 

11.29 

1136 

11.42 

11.49 

7 

4 

11.56 

11.63 

11.70 

11.76 

11.83 

11.90 

11.97 

12.04 

12.11 

12.18 


S.6 

12.25 

12.32 

12.39 

12.46 

12.53 

12.60 

12.67 

12.74 

12.82 

12.89 


6 

12.96 

13.03 

13.10 

13.18 

13.25 

1)32 

13.40 

13.47 

13.54 

13.62 


7 

13.69 

13.76 

13.84 

13.91 

13.99 

14.06 

14.14 

14.21 

14.29 

14.36 

8 

8 

14.44 

14.52 

14.59 

14.67 

14.75 

14.82 

14.90 

14.98 

15.05 

15.13 


9 

15.21 

15-29 

1537 

15.44 

1532 

15.60 

15.68 

15.76 

15.84 

15.92 


4.0 

16.00 

1608 

16.16 

16.24 

16.32 

16.40 

16.48 

16 56 

16.65 

16.73 


1 

16.81 

16.89 

16.97 

17.06 

17.14 

17.22 

17.31 

17.39 

17.47 

1736 


2 

17.64 

17.72 

17.81 

17.89 

17.98 

18.06 

18.15 

18.23 

1832 

18.40 


3 

18.49 

16.58 

18.66 

18.75 

18.84 

18.92 

19.01 

19.10 

19.18 

19.27 

9 

4 

1936 

19.45 

19.54 

19.62 

19.71 

19.80 

19.89 

19.98 

20.07 

20.16 


4.6 

20.25 

20.34 

20.43 

20.52 

20.61 

20.70 

20.79 

20.88 

20.98 

21 07 


6 

21.16 

21.25 

21.34 

21.44 

21.53 

21.62 

21.72 

21.81 

21.90 

22 00 


7 

22.09 

22.18 

22.28 

22.37 

22.47 

22.56 

22.66 

22.75 

22.85 

22.94 

10 

8 

23.04 

23.14 

23.23 

23.33 

23.43 

23.52 

23.62 

23.72 

23.81 

23.91 


9 

24.01 

24.11 

24.21 

2430 

24.40 

24.50 

24.60 

24.70 

24.80 

24.90 



r* = 9.86960 (r/2)* = 2.4G740 1/r* = 0.101321 


Explanation of Table of Squares. 

This table gives the value of TV* for values of N from 1 to 10, correct to four figures. 
(Interpolated values may bo in error by 1 in the fourth figure). 

To find the square of a number N outside the rhnge from 1 to 10, note that 
moving the decimal point one place in column N is equivalent to moving it two places 
in the body of the table. For example: 

(3.217)* = 10.35; (0.03217)* ~ 0.001035; (3217)* - 10350000 

This table can also be used inversely, to give square roots. 












348 


A FIRST COURSE IN STATISTICAL METHOD 


SQUARES ( continued) 





MATHEMATICAL T A BLES 


349 


COMMON LOGARITHMS {special table) 



<-0 


.00 0.0000 

1.01 0043 

1.02 0086 

1.03 0128 

1.04 0170 

1.05 0212 

1.06 0253 

1.07 0294 

0334 
1.09 I 0374 


1.10 

1.11 

1.12 

1.13 

1.14 


0.0414 

0453 

0492 

0531 

0569 

0607 

0645 

0582 

0719 

0755 


0004 

0009 

0013 

0017 

0022 

0048 

0052 

0056 

0060 

0065 

0090 

0095 

0099 

0103 

0107 

0133 

0137 

0141 

0145 

0149 

0175 

0179 

0183 

0187 

0191 

0216 

0220 

0224 

0228 

02)3 

0257 

0261 

0265 

0269 

027) 

0298 

0)02 

0)06 

0310 

0)14 

0338 

0)42 

0346 

0)50 

0354 

0378 

0382 

0386 

0390 

0394 

0418 

0422 

0426 

0430 

0434 

0457 

0461 

0465 

0469 

047) 

0496 

0500 

0501 

0508 

0512 

0535 

0538 

0542 

0546 

0550 

0573 

0577 

0580 

0584 

0588 

0611 

0515 

0618 

0622 

0626 

0648 

0652 

0656 

0660 

0663 

0686 

0689 

0693 

0697 

0700 

0722 

0726 

0730 

0734 

0737 

0759 

0763 

0766 

0770 

0774 


0026 

0069 

0111 

0154 

0195 

0237 

0278 

0318 

0358 

0398 

0438 

0177 

0515 

0554 

0592 

0530 

0667 

0704 

0741 

0777 


0030 

0073 

0116 

0158 

0199 

0241 

0282 

0)22 

0)62 

0402 

0441 

0181 

0519 

0558 

0596 

05)3 

0371 

0703 

0745 

0781 


0035 

0077 

0120 

0162 

0204 

0245 

0286 

0)26 

0366 

04C6 

0445 

0484 

052) 

0561 

0599 

0637 

0674 

0711 

0748 

0785 


0039 

0082 

0124 

0166 

0208 

0249 

0290 

0330 

0370 

0410 

0449 

0488 

0527 

0565 

0603 

0641 

0678 

0715 

0752 

0788 


1.30 

131 

132 

133 

134 


1.40 

1.41 

1.42 

1.43 

1.44 


0.0792 

0795 

0799 

0303 

0306 

0310 

0813 

0817 

0821 

0824 

0828 

0831 

0835 

0339 

0842 

0846 

0849 

085) 

0856 

0860 

0364 

0867 

0371 

0374 

0378 

0881 

0385 

0388 

0392 

0896 

0899 

0903 

0906 

0910 

0913 

0917 

0920 

0924 

0927 

0931 

0934 

0938 

0941 

0945 

0948 

0952 

0955 

0959 

0962 

0966 

0969 

0973 

0976 

0980 

0983 

0986 

0990 

099) 

0997 

1000 

1004 

1007 

1011 

1014 

1017 

1021 

1024 

1028 

1031 

1035 

1038 

1041 

1045 

1048 

1052 

1055 

1059 

1062 

1065 

1069 

1072 

1075 

1079 

1082 

1086 

1089 

1092 

1096 

1099 

1103 

1106 

1109 

1113 

1116 

1119 

1123 

1126 

1129 

1133 

1136 

0.1139 

1143 

1146 

1149 

1153 

1156 

1159 

1163 

1166 

1169 

1173 

1176 

1179 

1183 

1186 

1189 

119) 

1196 

1199 

1202 

1206 

1209 

1212 

1216 

1219 

1222 

1225 

1229 

1232 

1235 

1239 

1242 

1245 

1243 

1252 

1255 

1258 

1261 

1265 

1268 

1271 

1274 

1278 

1281 

1284 

1287 

1290 

1294 

1297 

1300 

1303 

1307 

1310 

1313 

1316 

1319 

1323 

1326 

1329 

1332 

1335 

1339 

1342 

1345 

1348 

1351 

1355 

1)58 

1361 

1364 

1367 

1370 

1374 

1377 

1380 

138) 

1386 

1389 

1)92 

1396 

1399 

1402 

1405 

1408 

1411 

1414 

1418 

1421 

1424 

1427 

1430 

1433 

1436 

1440 

1443 

1446 

1449 

1452 

1455 

1458 

0.1461 

1464 

1467 

1471 

1474 

1477 

1480 

1483 

I486 

1489 

1492 

1495 

1498 

1501 

1504 

1508 

1511 

1514 

1517 

1520 

1523 

1526 

1529 

1532 

1535 

1538 

1541 

1544 

1547 

1550 

1553 

1556 

1559 

1562 

1565 

1569 

1572 

1575 

1578 

1581 

1584 

1587 

1590 

1593 

1596 

1599 

1602 

1605 

1608 

1611 

1614 

1617 

1620 

1623 

1626 

1629 

1632 

1635 

1638 

1641 

1644 

1647 

1649 

1652 

1655 

1658 

1661 

1664 

1667 

1670 

1673 

1676 

1679 

1682 

1685 

1688 

1691 

1694 

1697 

1700 

1703 

1706 

1708 

1711 

1714 

1717 

1720 

1723 

1726 

1729 

1732 

1735 

1738 

1741 

1744 

1746 

1749 

1752 

1755 

1758 


Moving the decimal point n places to the right [or left] in the number requires adding+ n 
[or — n] in the body oftho table. 



350 


A FIRST COURSE IN STATISTICAL METHOD 


COMMON LOGARITHMS (special table, continued) 



P 


1.50 

1.51 

132 

133 

134 

1.55 

136 

137 
1.58 
139 

1.60 

1.61 

1.62 

1.63 

1.64 

1.65 

1.66 

1.67 

1.68 

1.69 

1.70 

1.71 

1.72 

1.73 

1.74 

1.75 

1.76 

1.77 

1.78 

1.79 

1.80 

1.81 

1.82 

1.83 

1.84 

1.85 

1.86 

1.87 

1.88 

1.89 

1.90 

1.91 

1.92 

1.93 

1.94 

1.95 

1.96 

1.97 

1.98 

1.99 


0.1761 

1764 

1767 

1770 

1772 

1775 

1778 

1781 

1784 

1787 

1790 

1793 

1796 

1798 

1801 

1804 

1807 

1810 

1813 

1816 

1818 

1821 

1824 

1827 

1830 

1833 

1836 

1838 

1841 

1844 

1847 

1850 

1853 

1855 

1858 

1861 

1864 

1867 

1870 

1872 

1875 

1878 

1881 

1884 

1886 

1889 

1892 

1895 

1898 

1901 

1903 

1906 

1909 

1912 

1915 

1917 

1920 

1923 

1926 

1928 

1931 

1934 

1937 

1940 

1942 

1945 

1948 

1951 

1953 

1956 

1959 

1962 

1965 

1967 

1970 

1973 

1976 

1978 

1981 

1984 

1987 

1989 

1992 

1995 

1998 

2000 

2003 

2006 

2009 

2011 

2014 

2017 

2019 

2022 

2025 

2028 

2030 

2033 

2036 

2038 

0.2041 

2044 

2047 

2049 

2052 

2055 

2057 

2060 

2063 

2066 

2068 

2071 

2074 

2076 

2079 

2082 

2084 

2087 

2090 

2092 

2095 

2098 

2101 

2103 

2106 

2109 

2111 

2114 

2117 

2119 

2122 

2125 

2127 

2130 

2133 

2135 

2138 

2140 

2143 

2146 

2148 

2151 

2154 

2156 

2159 

2162 

2164 

2167 

2170 

2172 

2175 

2177 

2180 

2183 

2185 

2188 

2191 

2193 

2196 

2198 

2201 

2204 

2206 

2209 

2212 

2214 

2217 

2219 

2222 

2225 

2227 

2230 

2232 

2235 

2238 

2240 

2243 

2245 

2248 

2251 

2253 

2256 

2258 

2261 

2263 

2266 

2269 

2271 

2274 

2276 

2279 

2281 

2284 

2287 

2289 

2292 

2294 

2297 

2299 

2302 

0.2304 

2307 

2310 

2312 

2315 

2317 

2320 

2322 

2325 

2327 

2330 

2333 

2335 

2338 

2340 

2343 

2345 

2348 

2350 

2353 

2355 

2358 

2360 

2363 

2365 

2368 

2370 

2373 

2375 

2378 

2380 

2383 

2385 

2388 

2390 

2393 

2395 

2398 

2400 

2403 

2405 

2408 

2410 

2413 

2415 

2418 

2420 

2423 

2425 

2428 

2430 

2433 

2435 

2438 

2440 

2443 

2445 

2448 

2450 

2453 

2455 

2458 

2460 

2463 

2465 

2467 

2470 

2472 

2475 

2477 

2480 

2482 

2485 

2487 

2490 

2492 

2494 

2497 

2499 

2502 

2504 

2507 

2509 

2512 

2514 

2516 

2519 

2521 

2524 

2526 

2529 

2531 

2533 

2536 

2538 

2541 

2543 

2545 

2548 

2550 

0.2553 

2555 

2558 

2560 

2562 

2565 

2567 

2570 

2572 

2574 

2577 

2579 

2582 

2584 

2586 

2589 

2591 

2594 

2596 

2598 

2601 

2603 

2605 

2608 

2610 

2613 

2615 

2617 

2620 

2622 

a 4 M A 

2625 

2627 

2629 

2632 

2634 

2636 

2639 

2641 

264) 

2646 

A A A 4% 

2648 

2651 

2653 

2655 

2658 

2660 

2662 

2665 

2667 

2669 

2672 

2674 

2676 

2679 

2681 

2683 

2686 

2688 

2690 

2693 

2695 

2697 

2700 

2702 

2704 

2707 

2709 

2711 

2714 

2716 

2718 

2721 

2723 

2725 

2728 

2730 

2732 

2735 

2737 

2739 

2742 

2744 

2746 

2749 

2751 

2753 

2755 

2758 

2760 

2762 

2765 

2767 

2769 

2772 

2774 

2776 

2778 

2781 

2783 

2785 

0.2788 

2790 

2792 

2794 

2797 

2799 

2801 

2804 

2806 

2808 

2810 

2813 

2815 

2817 

2819 

2822 

2824 

2826 

2828 

2831 

AAf A 

2833 

2835 

2838 

2840 

2842 

2844 

2847 

2849 

2851 

2853 

2856 

2858 

2860 

2862 

2865 

2867 

2869 

2871 

2874 

2876 

2878 

2880 

2882 

2885 

2887 

2889 

2891 

2894 

2896 

2898 

2900 

2903 

2905 

2907 

2909 

2911 

2914 

2916 

2918 

2920 

2923 

2925 

2927 

2929 

2931 

2934 

2936 

2938 

2940 

2942 

>|A/ M 

2945 

2947 

2949 

2951 

2953 

2956 

2958 

2960 

2962 

2964 

aaa/ 

2967 

2969 

2971 

2973 

2975 

2978 

2989 

2982 

2984 

2986 

2989 

2991 

2993 

2995 

2997 

2999 

3002 

3004 

3006 

300o 





MATHEMATICAL TABLES 


351 


COMMON LOGARITHMS 


is 

0 

1 

3 

S 

4 

5 

6 

7 

8 

9 

& 

1.0 

0.0000 

0043 

0086 

0128 

0170 

0212 

0253 

0294 

0334 

0374 


l.l 

0414 

0453 

0492 

0531 

0569 

0607 

0645 

0682 

0719 

0755 


12 

0792 

0828 

0864 

0899 

0934 

0969 

1004 

1038 

1072 

1106 


!3 

1139 

1173 

1206 

1239 

1271 

1303 

1335 

1367 

1399 

1430 


1.4 

1461 

1492 

1523 

1553 • 

1584 

1614 

1644 

1673 

1703 

1732 


15 

1761 

1790 

1818 

1847 

1875 

1903 

1931 

1959 

1987 

2014 


1.6 

2041 

2068 

2095 

2122 

2148 

2175 

2201 

2227 

2253 

2279 


17 

2304 

2330 

2355 

2380 

2405 

2430 

2455 

2460 

2504 

2529 


1.8 

2553 

2577 

2601 

2625 

2648 

2672 

2695 

2718 

2742 

2765 


1.9 

2788 

2810 

2833 

2856 

2878 

2900 

2923 

2945 

2967 

2989 



05010 

3032 

3054 

3075 

3096 

3118 

3139 

3160 

3181 

3201 

21 

2.1 

3222 

3243 

3263 

3284 

3304 

3324 

3345 

3365 

3385 

3404 

20 

27 

3424 

3444 

3464 

3483 

3502 

3522 

3541 

3560 

3S79 

3598 

19 

23 

3617 

3636 

3655 

3674 

3692 

3711 

3729 

3747 

3766 

3784 

18 

2.4 

3802 

3820 

3838 

3856 

3874 

3892 

3909 

3927 

3945 

3962 

17 

25 

3979 

3997 

4014 

4031 

4048 

4065 

4082 

4099 

4116 

4133 

17 

2.6 

4150 

4166 

4183 

4200 

4216 

4232 

4249 

4265 

4281 

4298 

16 

17 

4314 

4330 

4346 

4362 

4378 

4393 

4409 

4425 

4440 

4456 

16 

2.8 

4472 

4487 

4502 

4518 

4533 

4548 

4564 

4579 

4594 

4609 

15 

19 

4624 

4639 

4654 

4669 

4683 

4698 

4713 

4728 

4742 

4757 

15 

8.0 

0.4771 

4786 

4800 

4814 

4829 

4843 

4857 

4871 

4886 

4900 

14 

3.1 

4914 

4928 

4942 

4955 

4969 

4983 

4997 

5011 

5024 

5038 

14 

37 

5051 

5065 

5079 

5092 

5105 

5119 

5132 

5145 

5159 

517? 

13 

3.3 

5185 

5198 

5211 

5224 

5237 

5250 

5263 

5276 

5289 

5302 

13 

3.4 

5315 

5328 

5340 

5353 

5366 

5378 

5391 

5403 

5416 

5428 

13 

35 

5441 

5453 

5465 

5478 

5490 

5502 

5514 

5527 

5539 

5551 

12 

3.6 

5563 

5575 

5587 

5599 

5611 

5623 

5635 

5647 

5658 

5670 

12 

3.7 

5682 

5694 

5705 

5717 

5729 

5740 

5752 

5763 

5775 

5786 

12 

3.8 

5798 

5809 

5821 

5832 

5843 

5855 

5866 

5877 

5888 

5899 

■ I 

3.9 

5911 

5922 

5933 

5944 

5955 

5966 

5977 

5988 

5999 

6010 

11 

4.0 

0.6021 

6031 

6042 

6053 

6064 

6075 

6085 

6096 

6107 

6117 

1< 

4.1 

6128 

6138 

6149 

6160 

6170 

6180 

6191 

6201 

6212 

6222 

10 

4.2 

6232 

6243 

6253 

6263 

6274 

6284 

6294 

6304 

6314 

6325 

10 

45 

6335 

6345 

6355 

6365 

6375 

6385 

6395 

6405 

6415 

6425 

10 

4.4 

6435 

6444 

6454 

6464 

6474 

6484 

6493 

6503 

6513 

6522 

10 

45 

6532 

6542 

6551 

6561 

6571 

6580 

6590 

6599 

6609 

6618 

10 

4.6 

6628 

6637 

6646 

6656 

6665 

6675 

6684 

6693 

6702 

6712 

10 

47 

6721 

6730 

6739 

6749 

6758 

6767 

6776 

6785 

6794 

6803 

9 

4.8 

6812 

6821 

6330 

6839 

6848 

6857 

6866 

6875 

6884 

6893 

9 

4.9 

6902 

6911 

6920 

6928 

6937 

6946 

6955 

6964 

6972 

6981 

9 

log 

log 

x ~ 0.4971 log t/2 » 0.19G1 log 
e ■= 0.4343 log (0.4343) - 0.6378 - 

x* = 
1 

0.9943 

log Vx ~ 

0.24S6 



Theae two pages give the common logarithms of numbers between 1 and 10, correct 
to four places. Moving the decimal point n places to the right [or left] in the number is 
equivalent to adding n [or —n] to tho logarithm. Thus, log 0.017453 •=• 0.2419 — 2, 
which may also be written 2.2419 or 8.2419 - 10. 

log (ai) - log a + log 6 

log - log o — log b 


log (a^ r ) ■= AT log a 
log (Vo) =» log a 




352 


A FIRST COURSE IN STATISTICAL METHOD 


COMMON LOGARITHMS {continued) 


l & 

0 

1 

2 

3 

4 

6 

6 

7 

8 

9 


0.6990 

6998 

7007 

7016 

7024 

7033 

7042 

7050 

7059 

7067 

5.1 

7076 

7084 

7093 

7101 

7110 

7118 

7126 

7135 

7143 

7152 

5 2 

7160 

7168 

7177 

7185 

7193 

7202 

7210 

7218 

7226 

7235 

5.3 

7243 

7251 

7259 

7267 

7275 

7284 

7292 

7300 

7308 

7316 

5.4 

7324 

7332 

7340 

7348 

7356 

7364 

7372 

7380 

7388 

7396 

53 

7404 

7412 

7419 

7427 

7435 

7443 

745 ! 

7459 

7466 

7474 

5.6 

7482 

7490 

7497 

7505 

7513 

7520 

7528 

7536 

7543 

7551 

5.7 

7559 

7566 

7574 

7582 

7589 

7597 

7604 

7612 

7619 

7627 

5.8 

7634 

7642 

7649 

7657 

7664 

7672 

7679 

7686 

7694 

7701 

5.9 

7709 

7716 

7723 

7731 

7738 

7745 

7752 

7760 

7767 

7774 

m 

0.7782 

7789 

7796 

7803 

7810 

7818 

7825 

7832 

7839 

7846 

6.1 

7853 

7860 

7868 

7875 

7882 

7889 

7896 

7903 

7910 

7917 


7924 

M rv a a 

7931 

7938 

7945 

7952 

7959 

7966 

7973 

7980 

7987 


7993 

8000 

8007 

8014 

8021 

8028 

8035 

8041 

8048 

8055 


8062 

8069 

8075 

8082 

8089 

8096 

8102 

8109 

WI vJ 

8116 

VA JJJ 
8122 

63 

8129 

8136 

8142 

8149 

8156 

8162 

8169 

8176 

8182 

8189 

6.6 

8195 

8202 

8209 

8215 

8222 

8228 

8235 

8241 

8248 

8254 

6.7 

8261 

8267 

8274 

8280 

8287 

8293 

8299 

8306 

8312 

8319 

6.8 

8325 

8331 

8338 

8344 

8351 

8357 

8363 

8370 

8376 

8382 

6.9 

8388 

8395 

8401 

8407 

8414 

8420 

8426 

8432 

8439 

8445 


0.8451 

8457 

8463 

8470 

8476 

8482 

8488 

8494 

8500 

8506 

7.1 

8513 

8519 

8525 

8531 

8537 

8543 

8549 

8555 

8561 

8567 

72 

8573 

8579 

8585 

8591 

8597 

8603 

8609 

8615 

8621 

8627 

73 

8633 

8639 

8645 

8651 

8657 

8663 

8669 

8675 

8681 

8686 

7.4 

8692 

8698 

8704 

8710 

8716 

8722 

8727 

8733 

8739 

8745 

73 

875 ! 

8756 

8762 

8768 

8774 

8779 

8785 

8791 

8797 

8802 

7.6 

8808 

8814 

8820 

8825 

8831 

8837 

8842 

8848 

8854 

8859 

7.7 

8865 

8871 

8876 

8882 

8887 

8893 

8899 

8904 

8910 

8915 

7.8 

8921 

8927 

8932 

8938 

8943 

8949 

8954 

8960 

8965 

8971 

7.9 

8976 

8982 

8987 

8993 

8998 

9004 

9009 

9015 

9020 

9025 

8.0 

0.9031 

9036 

9042 

9047 

9053 

9058 

9063 

9069 

9074 

9079 

8.1 

9085 

9090 

9096 

9101 

9106 

9112 

9117 

9122 

9128 

9133 

8 2 

9138 

9143 

9149 

9154 

9159 

9165 

9170 

9175 

9180 

9186 

8.3 

9191 

9196 

9201 

9206 

9212 

9217 

9222 

9227 

9232 

9238 

8.4 

9243 

9248 

9253 

9258 

9263 

9269 

9274 

9279 

9284 

9289 

83 

9294 

9299 

9304 

9309 

9315 

9320 

9325 

9330 

9335 

9340 

8.6 

9345 

9350 

9355 

9360 

9365 

9370 

9375 

9380 

9385 . 

. 9390 

8.7 

9395 

9400 

9405 

9410 

9415 

9420 

9425 

9430 

9435 

9440 

8.8 

9445 

9450 

9455 

9460 

9465 

9469 

9474 

9479 

9484 

9489 

8.9 

9494 

9499 

9504 

9509 

9513 

9518 

9523 

9528 

9533 

9538 

9.0 

0.9542 

9547 

9552 

9557 

9562 

9566 

9571 

9576 

9581 

9586 

9.1 

9590 

9595 

9600 

9605 

9609 

9614 

9619 

9624 

9628 

9633 

9.2 

9638 

9643 

9647 

9652 

9657 

9661 

9666 

9671 

9675 

9680 

a a mm 

93 

9685 

9689 

9694 

9699 

9703 

9708 

9713 

9717 

9722 

9727 

9.4 

9731 

9736 

9741 

9745 

9750 

9754 

9759 

9763 

9768 

9773 

93 

9777 

9782 

9786 

9791 

9795 

9800 

9805 

9809 

9814 

9818 

9.6 

9823 

9827 

9832 

9836 

9841 

9845 

9850 

9854 

9859 

9863 

9.7 

9868 

9872 

9877 

9881 

9886 

9390 

9894 

9899 

9903 

5908 

9.8 

9912 

9917 

9921 

9926 

9930 

9934 

9939 

9943 

9948 

9952 

9.9 

9956 

9961 

9965 

9969 

9974 

9978 

9983 

9987 

9991 

9996 









INDEX 


Numbers refer to pages. 


A 

Accuracy, degree of, 8 
false, 8 

Area, determined by definite integral, 
341 

of frequency polygon, 74 
of rectangular histogram, 73 
under curve, 76 
Array, 4 

Average, 96 (see also Mean), 
for comparison, 3 
general definition of, 96 
rate of growth, 124 
representative type form, 96 
Axis, 56 
of abscissas, 56 
of dependent variable, 56 
of independent variable, 56 
of ordinates, 56 
of parabola, 65 

B 

Base of a system of logarithms, 295 
Binomial distribution, non-sym- 
metric, 35 
symmetric, 33 

freshman heights compared 
with, 33, 86 
histograms, 77 
expansion, formula for, 310 
if n is not a positive integer, 313 
rules for forming terms of, 312 
Boundaries, class, 25 
distinction of, from class limits, 27 

C 

Characteristic of logarithms, 299 
rule for finding, 299 
Charlier check, 233 


Chart (see also Diagram), 
geometric, 254 
ratio, 256 

shows per cent change, 261 
Circle, equation of, 63 
Class, 5 

boundaries, 25 
frequency, 5 
interval, 5 

as unit of measure, 232 
effect of widening, 27, 84 
limits, 25 

distribution from class boun¬ 
daries, 27 
Closest fit, 186 
frequency curve of, 200 
logarithmic least squares line of, 
264 

parabola of, 196 

normal equations for, 197 
straight line of, 187 

graphical method for, 187 
least squares, 190 
method of averaging for, 188 
normal equations for least 
squares, 192 
Cologarithm, 305 
Combinations, definition of, 307 
number of, of n things r at a time, 
309 

simple relationship between, 309 
Comparison, as purpose of statistics, 

3 

Constant of integration, 338 
cancels in definite integral, 342 
Coordinates, 56 
axes of, 56 
three, 333 

Correlation, Chap. IX 
by ranks, 243 


353 



354 


INDEX 


Correlation, definition of, 212 
formula for Pearson’s coefficient 
of, 229 

origin at zero, 231 
short-cut, 229 

idea of, developed through prob¬ 
ability, 214-222 
measure of, 227, 228 
ratio, 236 
skew, 227, 236 
table, construction of, 231 
form of, 234 
when r equals zero, 240 
Curve fitting, Chap. VIII 
by normal probability curve, 200, 
201, 202 

by parabola, 196 
by straight line, 187 
certain type forms for, 186 

D 

Deck, 255 
Derivative, 325 

equals slope of tangent to a curve, 
325 

speed, 332 
of x n , 325 
partial, 334 
second, 330 

equals acceleration, 332 
Deviation, 5, 7 
average, 134 
quartile, 159 
standard, 137 

computation of, from frequency 
table, 142 
equals y/npq, 147 
measure of consistency, 139 
mechanical interpretation of, 
139 

other names for, 139 
satisfies general definition of 
average, 147 

short-cut computation of, 144 
Deviations, sum of, from any arbi¬ 
trary value, 98 
from arithmetic mean, 97 
Diagram, bar, 42, 45 


Diagram, line, 51 
one hundred per cent bar, 45 
peak-top, 51 
pie, 45 

percentage, 47 
use of, 47 
scatter, 39, 213 

Dispersion, coefficient of, 135,* 141, 
159 

comparison of measures of, 161 
for comparison, 4 
idea of, 131 

measure of, average deviation as, 
134 

range as, 132 

standard deviation as, 137 
quartile measure of, 158 
Division points, 150 
formulas for, 155 
graphical determination of, 161 
in general, 153 
median, 111, 151 
percentiles, 153 
quartiles, 150, 151, 152 
significance of, 153 
Double entry, table of, 35, 36 

E 

Equation, of circle, 63 
normal probability curve, 177, 
185, 200 
parabola, 64 
of closest fit, 196, 197 
straight-line, 57, 59, 60, 62 
of closest fit, 187,188,190,222 
regression lines, 222, 224, 225 

F 

Forecasting, 260, 266 
Frequency, 25 
classes, 160 
curve, 74 
area of, 76 
drawing of, 75 
fitting of, 76, 84, 200. 204 
symmetric binomial, 77 
distributions, comparison of, 29 
due to pure chance, 29 
polygon, 73 
area of, 74 


INDEX 


355 


Frequency, tables, 39 
theoretical, in tossing coins, 31 
Function, 50, 324 
derivative of, 325 
gives slope of tangent, 325 
graph of, 51 

increasing and decreasing, 329 
linear, 61 

maximum value of, 328 
test for, 330 
minimum value of, 329 
test for, 330 

of a function, derivative of, 332 
rule for, 333 
of two variables, 333 
application to least squares 
adjustment, 337 
graph of, 334 

maximum or minimum values 
of, 335 

plotting of, 53 
second degree, 65 

G 

Geometric mean, 122 
as average rate of change, 265 
connection, with arithmetic mean, 
123 

with harmonic mean, 127 
slope of logarithmic straight line, 
266 

Gomperz curve, 267 
Graduation, 198 
of frequency distribution, 200 
of freshman heights to normal 
probability curve, 204 
Graph, area of, 73, 74, 76 (sec also 
Diagram), 
frequency, 71 

general rules for making, 89 
logarithmic, Chap. X 

reading of scale numbers on, 255 
unlimited, 255 
object of, 42 

H 

Heads and tails, number of in 500 
throws of seven dimes, 16 
Histogram, 71 


Histogram, from pure chance, 76 
rectangular, 72 
area of, 73, 74 
symmetric binomial, 77 

comparison of, with freshman 
heights, 86 

I 

Index numbers, Chap. XI 
as type forms, 293 
average price relative used as, 277 
averages used in computation of, 
280 

bias in formulas for, 288 
crossing of formulas for, 288 
definition of, 276 
Fisher’s “ Ideal,” 289 
forms of computation of, 290 
notation used in formulas, 276 
simple, 275 

systems of weighing used for, 281 
tests, of good formulas for, 282 
commodity reversal, 282 
factor reversal, 283 
time reversal, 282 
warning as to use of, 292 
Integration, between limits, 341 
constant of, 338 
definition of, 338 
geometric meaning of, 339 
Interpolation, by A. M. and G. M., 
124 

graphical, 51, 52 

in table of logarithms, 301 

K 

Kurtosis, 206 

L 

Least squares adjustment, 190 
formulas, derivation of, 337 
for parabola of, 197 
for straight line of, 192 
logarithmic line of, 200, 264 
Limit of a variable, 321 

illustration of, reached by the 
variable, 321 

that cannot be reached by the 
variable, 321 



356 


INDEX 


Limits, class, 25 
of definite integral, 341, 342 
Logarithms, base of system of, 295 
other than 10, 303 
ten, 297 

characteristic of, 298 
rule for, 299 
definition of, 295 
interpolation in table of, 301 
laws of, 295 
mantissa of, 297 
independent of decimal point, 
300 

modulus of system of, 304 
natural or Napierian, 304 
Lorenz curve, 162 
interpretation of, 164 

M 

Mantissa, 297 

independent of decimal point, 
300 

Maps, statistical, 89 
Maximum value of a function, 328 
test for, 330 
Mean, arithmetic, 96 

determination by frequency 
curve, 119 

determination by frequency 
table, 99 
short-cut, 100 
important property of, 97 
not always representative, 102 
satisfies general definition of 
average, 97 

short-cut for computing, 98 
sum of deviations from equals 
zero, 97 
weighted, 121 
geometric, 122 

connection with A. M., 123 
graphical determination, 265 
rate of growth by, 124 
harmonic, 126 

connection between A. M. and 
G. M., 127 

satisfies general definition of 
average, 128 


Median, 111 

definition of, more precise, 112, 
151 

differences from, sum of, a mini¬ 
mum, 117 

formula for, by interpolation, 114, 
155 

frequency curve, determination 
by, 119 

ogive, determination by, 116 
satisfies general definition of aver¬ 
age, 118 

Minimum value of a function, 329 
test for, 330 
Mode, 103 

effect on, by shifting class limits, 
110 

formula for, 103 

frequency graph, determined by, 
103 

grouping, determination by, 105, 
107 

ogive, determination by, 109 
Moving average, 207 
graph of, or line of trend, 209 

N 

Normal equations, for parabola of 
closest fit, 197 

for straight line of closest fit, 192 
other type forms of, 199 
probability curve, 82, 177, 185, 
200, 201 

O 

Ogive, 87 

comparison of, with histogram, 88 
division points determined by, 161 
median determined by, 116 
mode determined by, 109 
Organic growth, law of, 257 

P 

Parabola, equation of, 64 
of closest fit, 196 

normal equations for, 197 
use of, 65 


INDEX 


357 


Pareto’s law, 268 
Pascal’s triangle, 312 
Permutations, definitions of, 307 
value of, of n things r at a time, 308 
Pictogram, 48 
use with caution, 50 
Price relative, 277 

average, used as index number, 

277 

Probability, compound, 315 
definition of, 315 
laws of, 30 
proof of, 315-320 
normal curve of, 82, 177, 185, 200, 
201 

area under, 178 
partial, 317 
total, 318 

used to develop idea of correla¬ 
tion, 214-222 

Probable error, definition of, 177 
equals 0.674 of standard devia¬ 
tion, 179, 180, 203 
examples of use of, 182 
odds against a random selection 
being beyond three times, 181 
of various statistical constants, 
181 

value of, depends on AT, 183 

Q 

Quartile, 150 

coefficient of dispersion, 159 
first, 151 

measure of dispersion, 159 
third, 152 

R 

Ranks, correlation by, 243 

derivation of formula for, 244 
definition of, 243 
Regression coefficients, 225,226 
product of, a measure of cor¬ 
relation, 227 
equations, 222 to 225 
referred to axes through (0,0), 
240 

linear, 227 
departure from, 239 


Regression lines, 222, 225 
and correlation, 227 
coincidence of, 227 
meaning of, 225 
on coordinate axes, 227 
plotting of, 242 

S 

Sampling, 4, 74 
Saturation point, 258, 261 
Scale, choosing of, 89, 90 
logarithmic, 68, 252 
marks to be on heavy lines, 90 
of squares, 66 

uniform and non-uniform, 66 
Scatter diagram, 39, 213 
Skewness, 169 

coefficient of, 171, 172 
effect on mode, median, and arith¬ 
metic mean, 169 
extreme forms of, 175 
limits on, in fitting normal curve, 
206 

measure of, 170, 172 
significant degree of, 175, 183 
Slope of logarithmic straight line of 
closest fit, 266 
Smoothing, 52, 75, 76 
Standard deviation, 137 

chances of a random choice falling 
beyond certain multiples of, 
179, 203 

computation of, direct method, 
142 

short-cut method, 144 
equals y/nj)q, 147 
is an average, 147 
measure of consistency, 139 
mechanical interpretation of, 139 
other names for, 139 
Statistical regularity, law of, 4 
Statistics, definitions of, dictionary, 
1 

Yule’s, 2 

Straight line, equation of, intercept 
form of, 60 

not through the origin, 59 
through the origin, 57 
two points, 26 



358 


INDEX 


Straight line of closest fit, 187 

graphical method of finding, 187 
least squares, 190 
formulas derived for, 192 
principle involved, 190 
simplified formulas for, 193 
method of averaging, 188 
normal equations for, 192 
slope of, 58 

T 

Tabulation, Chap. II 
Tangent to a curve, definition of, 323 
slope of, 324 

equals derivative of function, 
325 


Trend, line of, by moving average, 
209 

logarithmic least squares line of, 
264 

V 

Variable, continuous, 3 
definition, 3 
dependent, 50 
discrete, 3 
historical, 3 
independent, 50 
not reaching limit, 321 
reaching the limit, 321 
Variates, definition of, 3 
Variation, time, place, quality, 23, 
24 



